KR20230036249A

KR20230036249A - Anomaly recognition method and system based on lstm

Info

Publication number: KR20230036249A
Application number: KR1020210118882A
Authority: KR
Inventors: 백성욱; 이미영; 울라 와심; 후세인 탄비어; 아마드 칸 줄피카
Original assignee: 세종대학교산학협력단
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2023-03-14
Also published as: KR102601233B1

Abstract

The present invention relates to an anomaly recognition method and system based on LSTM. The method includes the steps of: receiving a plurality of consecutive image frames to be determined for anomaly recognition; generating a feature vector representing characteristics of the plurality of image frames by providing the received image frames to a trained lightweight CNN model; and recognizing an abnormal activity corresponding to the plurality of image frames by providing the generated feature vector to a residual attention-based LSTM.

Description

LSTM-based anomaly recognition method and system {ANOMALY RECOGNITION METHOD AND SYSTEM BASED ON LSTM}

본 발명은 LSTM 기반의 이상 인식 방법 및 시스템에 관한 것으로, 구체적으로, 경량 CNN 및 새로운 유형의 LSTM을 이용하여 이미지 내에 포함된 이상 활동을 인식하는 방법 및 시스템에 관한 것이다.The present invention relates to an LSTM-based anomaly recognition method and system, and more specifically, to a method and system for recognizing anomaly activity included in an image using a lightweight CNN and a new type of LSTM.

안전한 스마트 시티 환경을 위해 감시 카메라를 통해 촬영된 영상에서 폭행, 범죄, 도로 사고 등의 이상 활동을 인식하는 것이 요구된다. 그러나, 실생활에서 이러한 이상 활동은 다양하고, 복잡하며, 빈번하게 발생하지 않으므로, 임의의 시스템이 이상 활동을 정확히 인식하는 것은 어렵다.For a safe smart city environment, it is required to recognize abnormal activities such as assaults, crimes, and road accidents in images captured through surveillance cameras. However, since such abnormal activities in real life are diverse, complex, and do not occur frequently, it is difficult for any system to accurately recognize the abnormal activities.

한편, 공공의 안전을 위해 감시 카메라에 의해 촬영된 영상을 실시간으로 분석하여 이상 활동을 인식하는 것은 중요하다. 그러나, 공공에 설치된 대부분의 감시 카메라는 녹화 기능만을 지원하고, 실시간 모니터링 기능을 지원하지 못한다. 이에 따라, 이상 활동을 인식하기 위해, 전문가가 녹화된 영상을 직접 보고 이상 활동이 발생했는지 여부를 판단해야 한다. 이와 같이, 전문가가 직접 판단하는 경우, 이상 활동의 발생을 인식하는데 소요되는 시간이 급격히 증가하게 된다.Meanwhile, for public safety, it is important to recognize abnormal activities by analyzing images captured by surveillance cameras in real time. However, most surveillance cameras installed in public support only a recording function and do not support a real-time monitoring function. Accordingly, in order to recognize an abnormal activity, an expert must directly view the recorded video and determine whether an abnormal activity has occurred. In this way, when an expert makes a direct judgment, the time required to recognize the occurrence of an abnormal activity increases rapidly.

본 발명은 상기와 같은 문제점을 해결하기 위한 LSTM 기반의 이상 인식 방법, 기록매체에 저장된 컴퓨터 프로그램 및 시스템(장치)을 제공한다. The present invention provides an LSTM-based abnormality recognition method, a computer program and a system (device) stored in a recording medium to solve the above problems.

본 발명은 방법, 시스템(장치) 또는 판독 가능한 저장(기록) 매체에 저장된 컴퓨터 프로그램을 포함한 다양한 방식으로 구현될 수 있다.The present invention can be implemented in a variety of ways, including a method, system (device) or computer program stored on a readable storage (recording) medium.

본 발명의 일 실시예에 따르면, 적어도 하나의 프로세서에 의해 수행되는 LSTM 기반의 이상 인식 방법은, 이상 인식의 판단 대상이 되는 연속된 복수의 이미지 프레임을 수신하는 단계, 수신된 복수의 이미지 프레임을 학습된 경량 CNN 모델에 제공하여, 복수의 이미지 프레임의 특징을 나타내는 특징 벡터를 생성하는 단계 및 생성된 특징 벡터를 잔차 어텐션 기반의 LSTM에 제공하여 복수의 이미지 프레임에 대응하는 이상 활동을 인식하는 단계를 포함한다.According to an embodiment of the present invention, an LSTM-based anomaly recognition method performed by at least one processor includes the steps of receiving a plurality of consecutive image frames that are subject to determination of anomaly recognition, the received plurality of image frames Generating feature vectors representing features of a plurality of image frames by providing them to the learned lightweight CNN model and recognizing abnormal activities corresponding to the plurality of image frames by providing the generated feature vectors to a residual attention-based LSTM includes

본 발명의 일 실시예에 따르면, 경량 CNN은 깊이 별 구별 가능한 컨볼루션 블록을 포함한다. 컨볼루션 블록은 3 x 3 필터와 연관된 깊이 별 컨볼루션 레이어 및 깊이 별 컨볼루션 레이어에 의해 필터링된 값을 병합하기 위한 1 x 1 필터와 연관된 포인트 별 컨볼루션 레이어를 포함한다.According to an embodiment of the present invention, a lightweight CNN includes convolution blocks distinguishable by depth. The convolution block includes a per-depth convolution layer associated with a 3 x 3 filter and a per-point convolution layer associated with a 1 x 1 filter for merging the values filtered by the per-depth convolution layer.

본 발명의 일 실시예에 따르면, 경량 CNN은 입력 데이터의 채널 수를 확장하기 위한 확장 레이어를 포함한다.According to one embodiment of the present invention, a lightweight CNN includes an extension layer for extending the number of channels of input data.

본 발명의 일 실시예에 따르면, 잔차 어텐션 기반의 LSTM은 훈련 시간을 감소시키기 위해 특징 벡터를 정규화하기 위한 정규화 레이어를 포함한다.According to an embodiment of the present invention, the LSTM based on residual attention includes a normalization layer for normalizing feature vectors to reduce training time.

본 발명의 일 실시예에 따르면, 잔차 어텐션 기반의 LSTM은 셀프 어텐션 레이어와 연관된다. 셀프 어텐션 레이어는 특징 벡터를 기초로 복수의 이미지 프레임의 연속적인 특징에 대한 상황 인식 벡터를 생성한다.According to an embodiment of the present invention, the LSTM based on residual attention is associated with a self-attention layer. The self-attention layer generates situational awareness vectors for continuous features of a plurality of image frames based on the feature vectors.

본 발명의 일 실시예에 따르면, 잔차 어텐션 기반의 LSTM은 이상 활동을 분류하기 위한 소프트 맥스 레이어와 연관된다. 이상 활동을 인식하는 단계는, 생성된 특징 벡터를 잔차 어텐션 기반의 LSTM에 제공하여 출력된 출력 데이터를 소프트 맥스 레이어에 입력하여 이상 활동을 인식하는 단계를 포함한다.According to one embodiment of the present invention, the LSTM based on residual attention is associated with a soft max layer for classifying abnormal activities. Recognizing the deviant activity includes recognizing the deviant activity by providing the generated feature vector to a residual attention-based LSTM and inputting the output data to the softmax layer.

본 발명의 일 실시예에 따르면, 잔차 어텐션 기반의 LSTM은 Adam 최적화 함수 및 크로스 엔트로피 손실 함수와 연관된다.According to one embodiment of the present invention, the residual attention-based LSTM is associated with an Adam optimization function and a cross entropy loss function.

본 발명의 일 실시예에 따른 상술된 LSTM 기반의 이상 인식 방법을 컴퓨터에서 실행하기 위해 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 제공된다.A computer program stored in a computer-readable recording medium is provided to execute the above-described LSTM-based anomaly recognition method according to an embodiment of the present invention in a computer.

본 발명의 일 실시예에 따른 이상 인식 시스템은, 통신 모듈, 메모리 및 메모리와 연결되고, 메모리에 포함된 컴퓨터 판독 가능한 적어도 하나의 프로그램을 실행하도록 구성된 적어도 하나의 프로세서를 포함한다. 적어도 하나의 프로그램은, 이상 인식의 판단 대상이 되는 연속된 복수의 이미지 프레임을 수신하고, 수신된 복수의 이미지 프레임을 학습된 경량 CNN 모델에 제공하여, 복수의 이미지 프레임의 특징을 나타내는 특징 벡터를 생성하고, 생성된 특징 벡터를 잔차 어텐션 기반의 LSTM에 제공하여 복수의 이미지 프레임에 대응하는 이상 활동을 인식하기 위한 명령어들을 포함한다.An abnormality recognition system according to an embodiment of the present invention includes a communication module, a memory, and at least one processor connected to the memory and configured to execute at least one computer-readable program included in the memory. At least one program receives a plurality of contiguous image frames to be judged for abnormality recognition, provides the received plurality of image frames to a learned lightweight CNN model, and generates feature vectors representing characteristics of the plurality of image frames. and instructions for recognizing abnormal activity corresponding to a plurality of image frames by providing the generated feature vector to a residual attention-based LSTM.

본 발명의 다양한 실시예에서 이상 인식 시스템은 경량화된 CNN 및 새로운 유형의 LSTM을 이용하여 실시간으로 발생하는 이상 활동을 높은 정확도로 인식할 수 있다.In various embodiments of the present invention, the anomaly recognition system can recognize anomalies occurring in real time with high accuracy using a lightweight CNN and a new type of LSTM.

본 발명의 다양한 실시예에서 셀프 어텐션 레이어를 이용함으로써, 잔차 어텐션 기반의 LSTM은 복수의 입력이 요구되는 다른 모델과 달리, 특징 벡터와 같은 하나의 입력만을 수신하여 효율적으로 이상 활동을 인식할 수 있다.By using the self-attention layer in various embodiments of the present invention, the LSTM based on residual attention can efficiently recognize abnormal activity by receiving only one input such as a feature vector, unlike other models that require multiple inputs. .

본 발명의 다양한 실시예에서 Adam 최적화 함수 및 크로스 엔트로피 손실 함수가 적용됨으로써, 이상 활동 인식 성능이 극대화될 수 있다.Abnormal activity recognition performance can be maximized by applying the Adam optimization function and the cross entropy loss function in various embodiments of the present invention.

본 발명의 다양한 실시예에서 잔차를 이용함으로써 학습이 진행되면서 각 파라미터에 대한 가중치가 변형되는 기울기 소실 문제가 효과적으로 해결될 수 있다.In various embodiments of the present invention, the loss of gradient problem in which the weight for each parameter is modified as learning progresses can be effectively solved by using the residual.

본 발명의 다양한 실시예에서 잔차 어텐션 기반의 LSTM은 더 적은 컴퓨팅 파워로도 실시간으로 이상 활동을 인식할 수 있다.In various embodiments of the present invention, the residual attention-based LSTM can recognize abnormal activity in real time with less computing power.

본 발명의 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급되지 않은 다른 효과들은 청구범위의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자("통상의 기술자"라 함)에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned are clear to those skilled in the art (referred to as "ordinary technicians") from the description of the claims. will be understandable.

본 발명의 실시예들은, 이하 설명하는 첨부 도면들을 참조하여 설명될 것이며, 여기서 유사한 참조 번호는 유사한 요소들을 나타내지만, 이에 한정되지는 않는다.
도 1은 본 발명의 일 실시예에 따른 이상 인식 시스템의 예시를 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 경량 CNN의 내부 구성을 나타내는 예시적인 도면이다.
도 3은 본 발명의 일 실시예에 따른 잔차 어텐션 기반의 LSTM의 내부 구성을 나타내는 예시적인 도면이다.
도 4는 본 발명의 일 실시예에 따른 잔차 어텐션 기반의 LSTM의 특징 추출 과정을 나타내는 예시적인 도면이다.
도 5는 본 개시의 일 실시예에 따른 잔차 어텐션 기반의 LSTM의 이상 활동에 대한 예측 결과를 나타내는 예시적인 혼동 행렬이다.
도 6은 본 발명의 일 실시예에 따른 LSTM 기반의 이상 인식 방법의 예시를 나타내는 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 이상 인식 시스템의 내부 구성을 나타내는 블록도이다.BRIEF DESCRIPTION OF THE DRAWINGS Embodiments of the present invention will be described with reference to the accompanying drawings described below, wherein like reference numbers indicate like elements, but are not limited thereto.
1 is a diagram showing an example of an anomaly recognition system according to an embodiment of the present invention.
2 is an exemplary diagram showing the internal configuration of a lightweight CNN according to an embodiment of the present invention.
3 is an exemplary diagram showing the internal configuration of an LSTM based on residual attention according to an embodiment of the present invention.
4 is an exemplary diagram illustrating a feature extraction process of LSTM based on residual attention according to an embodiment of the present invention.
5 is an exemplary confusion matrix showing a prediction result for abnormal activity of an LSTM based on residual attention according to an embodiment of the present disclosure.
6 is a flowchart illustrating an example of an anomaly recognition method based on LSTM according to an embodiment of the present invention.
7 is a block diagram showing the internal configuration of an abnormality recognition system according to an embodiment of the present invention.

이하, 본 발명의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 발명의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, specific details for the implementation of the present invention will be described in detail with reference to the accompanying drawings. However, in the following description, if there is a risk of unnecessarily obscuring the gist of the present invention, detailed descriptions of well-known functions or configurations will be omitted.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응되는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나, 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, identical or corresponding elements are given the same reference numerals. In addition, in the description of the following embodiments, overlapping descriptions of the same or corresponding components may be omitted. However, omission of a description of a component does not intend that such a component is not included in an embodiment.

개시된 실시예의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명이 완전하도록 하고, 본 발명이 통상의 기술자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것일 뿐이다.Advantages and features of the disclosed embodiments, and methods of achieving them, will become apparent with reference to the following embodiments in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and can be implemented in various different forms, only these embodiments make the present invention complete and the scope of the invention to those skilled in the art. It is provided only for complete information.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 개시된 실시예에 대해 구체적으로 설명하기로 한다. 본 명세서에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 관련 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.Terms used in this specification will be briefly described, and the disclosed embodiments will be described in detail. The terms used in this specification have been selected from general terms that are currently widely used as much as possible while considering the functions in the present invention, but these may vary depending on the intention or precedent of a person skilled in the related field, the emergence of new technologies, and the like. In addition, in a specific case, there is also a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, not simply the name of the term.

본 명세서에서의 단수의 표현은 문맥상 명백하게 단수인 것으로 특정하지 않는 한, 복수의 표현을 포함한다. 또한, 복수의 표현은 문맥상 명백하게 복수인 것으로 특정하지 않는 한, 단수의 표현을 포함한다. 명세서 전체에서 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다.Expressions in the singular number in this specification include plural expressions unless the context clearly dictates that they are singular. Also, plural expressions include singular expressions unless the context clearly specifies that they are plural. When it is said that a certain part includes a certain component in the entire specification, this means that it may further include other components without excluding other components unless otherwise stated.

본 발명에서, "포함하다", "포함하는" 등의 용어는 특징들, 단계들, 동작들, 요소들 및/또는 구성 요소들이 존재하는 것을 나타낼 수 있으나, 이러한 용어가 하나 이상의 다른 기능들, 단계들, 동작들, 요소들, 구성 요소들 및/또는 이들의 조합이 추가되는 것을 배제하지는 않는다.In the present invention, the terms "comprise", "comprising" and the like may indicate that features, steps, operations, elements and/or components are present, but may be used when such terms include one or more other functions, It is not excluded that steps, actions, elements, components, and/or combinations thereof may be added.

본 발명에서, 특정 구성 요소가 임의의 다른 구성 요소에 "결합", "조합", "연결" 되거나, "반응" 하는 것으로 언급된 경우, 특정 구성 요소는 다른 구성 요소에 직접 결합, 조합 및/또는 연결되거나, 반응할 수 있으나, 이에 한정되지 않는다. 예를 들어, 특정 구성 요소와 다른 구성 요소 사이에 하나 이상의 중간 구성 요소가 존재할 수 있다. 또한, 본 발명에서 "및/또는"은 열거된 하나 이상의 항목의 각각 또는 하나 이상의 항목의 적어도 일부의 조합을 포함할 수 있다.In the present invention, when a specific element is referred to as being “coupled”, “combined”, “connected”, or “reactive” to any other element, the specific element is directly bonded to, combined with, and/or other elements. or may be linked or reacted, but is not limited thereto. For example, one or more intermediate components may exist between certain components and other components. Also, in the present invention, “and/or” may include each of one or more items listed or a combination of at least a part of one or more items.

도 1은 본 발명의 일 실시예에 따른 이상 인식 시스템(100)의 예시를 나타내는 도면이다. 이상 인식 시스템(100)은 이미지 센서(예: 감시 카메라 등)에 의해 촬영된 영상을 수신하고, 수신된 영상을 기초로 이상 활동의 발생 여부를 인식하기 위한 시스템을 지칭할 수 있다. 예를 들어, 이상 인식 시스템(100)은 이미지 센서 내부에 포함되거나, 하나 이상의 이미지 센서와 연관될 수 있다. 도시된 바와 같이, 이상 인식 시스템(100)은 경량 CNN(light-weight convolutional neural network)(102)과 잔차 어텐션(residual attention) 기반의 LSTM(long short-term memory)(104)을 포함할 수 있다.1 is a diagram showing an example of an abnormality recognition system 100 according to an embodiment of the present invention. The anomaly recognition system 100 may refer to a system for receiving an image captured by an image sensor (eg, a surveillance camera, etc.) and recognizing whether an abnormal activity has occurred based on the received image. For example, the abnormality recognition system 100 may be included inside an image sensor or may be associated with one or more image sensors. As shown, the anomaly recognition system 100 may include a lightweight convolutional neural network (CNN) 102 and a long short-term memory (LSTM) 104 based on residual attention. .

일 실시예에 따르면, 이상 인식 시스템(100)에 포함된 경량 CNN(102)은 이상 인식의 판단 대상이 되는 연속된 복수의 이미지 프레임(110)을 수신할 수 있다. 예를 들어, 연속된 복수의 이미지 프레임(110)은 영상으로부터 추출된 연속된 30개의 이미지 프레임일 수 있으나, 이에 한정되지 않는다. 또한, 경량 CNN(102)은 연산량과 연산에 소요되는 컴퓨팅 파워, 연산 시간 등을 감소시키기 위해 경량화된 인공신경망을 지칭할 수 있다. 예를 들어, 경량 CNN(102)은 거대한 컨볼루션 레이어(convolutional layer)가 깊이 별 구별가능한 컨볼루션 블록(depth-wise distinguishable convolution block)으로 대체된 MobileNet을 포함할 수 있다.According to an embodiment, the lightweight CNN 102 included in the anomaly recognition system 100 may receive a plurality of consecutive image frames 110 to be judged for anomaly recognition. For example, the plurality of consecutive image frames 110 may be 30 consecutive image frames extracted from an image, but are not limited thereto. In addition, the lightweight CNN 102 may refer to a lightweight artificial neural network in order to reduce the amount of computation, computing power required for computation, and computation time. For example, the lightweight CNN 102 may include a MobileNet in which a huge convolutional layer is replaced with a depth-wise distinguishable convolution block.

이상 인식 시스템(100)은 수신된 복수의 이미지 프레임(110)을 학습된 경량 CNN 모델에 제공하여, 복수의 이미지 프레임(110)의 특징을 나타내는 특징 벡터(feature vector)(120)를 생성할 수 있다. 다시 말해, 경량 CNN(102)은 영상으로부터 추출된 복수의 이미지 프레임(110)을 수신하고, 해당 복수의 이미지 프레임(110)에 대응하는 특징 벡터(120)를 생성할 수 있다. 예를 들어, 하나의 영상에서 복수의 세트의 이미지 프레임들이 추출될 수 있으며, 경량 CNN(102)은 이러한 복수의 세트의 이미지 프레임 각각에 대하여 대응하는 특징 벡터를 생성할 수 있다. 여기서, 특징 벡터(120)는 복수의 이미지 각각에 포함된 특징점(feature point)을 기초로 생성된 벡터일 수 있으며, 특징점은 이미지 내의 코너(corner), 엣지(edge) 등을 기초로 추출될 수 있으나, 이에 한정되지 않는다.The anomaly recognition system 100 may generate a feature vector 120 representing characteristics of the plurality of image frames 110 by providing the received plurality of image frames 110 to the learned lightweight CNN model. there is. In other words, the lightweight CNN 102 may receive a plurality of image frames 110 extracted from an image and generate feature vectors 120 corresponding to the plurality of image frames 110 . For example, a plurality of sets of image frames may be extracted from one image, and the lightweight CNN 102 may generate a corresponding feature vector for each of the plurality of sets of image frames. Here, the feature vector 120 may be a vector generated based on a feature point included in each of a plurality of images, and the feature point may be extracted based on a corner or edge in the image. However, it is not limited thereto.

이상 인식 시스템(100)은 생성된 특징 벡터(120)를 잔차 어텐션 기반의 LSTM(104)에 제공하여 복수의 이미지 프레임(110)에 대응하는 이상 활동을 인식할 수 있다(130). 다시 말해, 잔차 어텐션 기반의 LSTM(104)은 경량 CNN(102)으로부터 특징 벡터(120)를 수신하고, 특징 벡터(120)에 포함된 정보를 기초로 복수의 이미지 프레임(110)에 대응하는 이상 활동을 인식할 수 있다. 여기서, 잔차 어텐션 기반의 LSTM(104)은 셀프 어텐션 레이어(self-attention layer)와 연관된 잔차 LSTM으로서, 하나의 정보가 아닌 연속된 정보(예: 연속된 복수의 이미지 프레임 등)에 대한 연산을 수행하기 위한 인공신경망일 수 있다. 즉, 잔차 어텐션 기반의 LSTM(104)은 특징 벡터(120)를 기초로 단일의 이미지 프레임이 아닌 연속된 복수의 이미지 프레임(110)에 포함된 정보를 고려하여 이상 활동의 발생 여부를 인식할 수 있다.The abnormality recognition system 100 may recognize abnormal activities corresponding to the plurality of image frames 110 by providing the generated feature vector 120 to the residual attention-based LSTM 104 (130). In other words, the residual attention-based LSTM 104 receives the feature vector 120 from the lightweight CNN 102, and based on the information included in the feature vector 120, an ideal image corresponding to a plurality of image frames 110 is generated. activity can be recognized. Here, the residual attention-based LSTM 104 is a residual LSTM associated with a self-attention layer, and performs operations on continuous information (eg, a plurality of consecutive image frames, etc.) It may be an artificial neural network for That is, the residual attention-based LSTM 104 can recognize whether an abnormal activity has occurred by considering information included in a plurality of consecutive image frames 110 rather than a single image frame based on the feature vector 120. there is.

도 1에서는 이상 인식 시스템(100)이 이상 활동의 발생 여부를 인식하는 것으로 상술되었으나, 이에 한정되지 않으며, 이상 인식 시스템(100)은 어떤 종류의 이상 활동이 발생했는지 여부를 인식할 수도 있다. 이와 같은 구성에 의해, 이상 인식 시스템(100)은 경량화된 CNN(102) 및 새로운 유형의 LSTM(104)을 이용하여 실시간으로 발생하는 이상 활동을 높은 정확도로 인식할 수 있다.In FIG. 1 , the abnormality recognition system 100 has been described in detail as recognizing whether or not an abnormal activity has occurred, but is not limited thereto, and the abnormality recognition system 100 may recognize whether any kind of abnormal activity has occurred. With this configuration, the abnormality recognition system 100 can recognize abnormal activities occurring in real time with high accuracy using the lightweight CNN 102 and the new type of LSTM 104.

도 2는 본 발명의 일 실시예에 따른 경량 CNN(220)의 내부 구성을 나타내는 예시적인 도면이다. 상술된 바와 같이, 경량 CNN(220)은 복수의 세트의 이미지 프레임들(210)(예: 복수의 이미지 프레임(i), 복수의 이미지 프레임(i+1), 복수의 이미지 프레임(N) 등)을 수신하고, 각 세트의 이미지 프레임들에 대한 특징 벡터(230)를 생성할 수 있다. 예를 들어, 경량 CNN(220)은 MobileNetV2와 연관되거나, MobileNetV2를 기초로 생성된 인공신경망일 수 있다.2 is an exemplary diagram showing the internal configuration of a lightweight CNN 220 according to an embodiment of the present invention. As described above, the lightweight CNN 220 includes a plurality of sets of image frames 210 (eg, a plurality of image frames (i), a plurality of image frames (i+1), a plurality of image frames (N), etc. ) and generate feature vectors 230 for each set of image frames. For example, the lightweight CNN 220 may be associated with MobileNetV2 or an artificial neural network created based on MobileNetV2.

일 실시예에 따르면, 경량 CNN(220)은 컨볼루션 레이어가 대체된 깊이 별 구별 가능한 컨볼루션 블록을 포함할 수 있다. 이러한 컨볼루션 블록은 3 x 3 필터(filter)와 연관된 깊이 별 컨볼루션 레이어 및 깊이 별 컨볼루션 레이어에 의해 필터링된 값을 병합하기 위한 1 x 1 필터와 연관된 포인트 별 컨볼루션 레이어를 포함할 수 있다. 즉, 경량 CNN(220)은 거대한 컨볼루션 레이어를 이러한 컨볼루션 블록으로 대체하여 경량화될 수 있으며, 이에 따라 적은 컴퓨팅 파워로 빠르게 동작할 수 있다.According to an embodiment, the lightweight CNN 220 may include a distinguishable convolution block for each depth in which a convolution layer is replaced. This convolution block may include a convolution layer for each depth associated with a 3 × 3 filter and a convolution layer for each point associated with a 1 × 1 filter for merging values filtered by the convolution layer for each depth. . That is, the lightweight CNN 220 can be lightweight by replacing a huge convolution layer with such a convolution block, and thus can operate quickly with little computing power.

일 실시예에 따르면, 경량 CNN(220)은 입력 데이터의 채널 수를 확장하기 위한 확장 레이어(expansion layer)를 포함할 수 있다. 예를 들어, 확장 레이어는 깊이 별 컨볼루션 레이어로 데이터가 전달되기 이전에, 채널 수를 확장하여 데이터를 전달할 수 있다. 다시 말해, 확장 레이어는 주어진 입력 채널의 수보다 더 많은 출력 채널을 생성할 수 있다. 또한, 포인트 별 컨볼루션 레이어는 프로젝션(projection) 레이어의 부분일 수 있다. 즉 포인트 별 컨볼루션 레이어는 다수의 채널을 더 작은 수의 채널로 프로젝팅하여 데이터를 전달할 수 있다.According to one embodiment, the lightweight CNN 220 may include an expansion layer for extending the number of channels of input data. For example, the extension layer may transfer data by extending the number of channels before data is transferred to the depth-specific convolution layer. In other words, an enhancement layer can generate more output channels than a given number of input channels. Also, the point-by-point convolution layer may be part of a projection layer. That is, the point-by-point convolution layer may transfer data by projecting a number of channels to a smaller number of channels.

경량 CNN(220)은 병목 잔차 블록(bottleneck residual block)을 포함할 수 있다. 이러한 병목 잔차 블록은 특징 벡터(230)를 생성하기 위해 사용되는 파라미터의 수를 감소시키거나 매트릭스 곱(matrix multiplication)의 수를 감소시킬 수 있다. 또한, 경량 CNN(220)은 활성화 함수로 사용되는 ReLU 및 정규화 레이어(예: 배치 정규화 레이어) 등을 포함할 수 있다. 또한, 경량 CNN(220)은 오버피팅(overfitting) 문제를 감소시키는 전역 평균 풀링 레이어(global average pooling layer)를 포함할 수 있다.The lightweight CNN 220 may include a bottleneck residual block. This bottleneck residual block may reduce the number of parameters used to generate the feature vector 230 or reduce the number of matrix multiplications. In addition, the lightweight CNN 220 may include a ReLU used as an activation function and a normalization layer (eg, a batch normalization layer). Additionally, the lightweight CNN 220 may include a global average pooling layer to reduce overfitting problems.

도 3은 본 발명의 일 실시예에 따른 잔차 어텐션 기반의 LSTM(310)의 내부 구성을 나타내는 예시적인 도면이다. 상술된 바와 같이, 잔차 어텐션 기반의 LSTM(310)은 특징 벡터(230)를 수신하고, 수신된 특징 벡터(230)를 기초로 이상 활동(330)을 인식할 수 있다. 예를 들어, LSTM은 RNN(recurrent neural network)의 일종으로서, 입력 게이트(input gate) 및 망각 게이트(forget gate)에 의해 제어되는 메모리 셀(memory cell)을 포함할 수 있다.3 is an exemplary diagram showing the internal configuration of the LSTM 310 based on residual attention according to an embodiment of the present invention. As described above, the residual attention-based LSTM 310 may receive the feature vector 230 and recognize the abnormal activity 330 based on the received feature vector 230 . For example, an LSTM is a type of recurrent neural network (RNN) and may include a memory cell controlled by an input gate and a forget gate.

일 실시예에 따르면, 셀 상태(cell state)는 망각 게이트에 의해 변경되거나, 입력 데이트에 의해 수정될 수 있다. 여기서, 망각 게이트는 이전 메모리에서 다음 시간 단계로 전달해야 하는 정보 및/또는 데이터의 양을 결정할 수 있다. 즉, 이전 메모리에 저장된 정보 및/또는 데이터가 다음 시간 단계에서 연산 등을 위해 필요한 정도를 판정하고, 판정된 양의 정보 및/또는 데이터가 전달될 수 있다. 또한, 입력 게이트는 메모리 셀에 입력해야 하는 새로운 정보 및/또는 데이터의 양을 결정할 수 있다. 이와 같은 입력 게이트 및 망각 게이트를 이용하여, LSTM은 순차적인 정보의 장단기 종속성의 문제를 효과적으로 처리할 수 있다. 이와 같은 LSTM은 다음의 수학식 1과 같이 결정될 수 있다.According to one embodiment, the cell state may be changed by the forget gate or modified by the input data. Here, the forget gate may determine the amount of information and/or data that must be passed from the previous memory to the next time step. That is, determining the extent to which information and/or data stored in the previous memory is necessary for an operation or the like at a next time step, the determined amount of information and/or data can be transferred. Also, the input gate may determine the amount of new information and/or data to be input into the memory cell. Using such an input gate and a forget gate, LSTM can effectively deal with the problem of long-term and short-term dependencies of sequential information. Such an LSTM can be determined as in Equation 1 below.

여기서,

,

는 히든 상태(hidden state)에 관한 이전 시간 단계의 정보를 제어하고 모니터링하는데 사용되는 파라미터일 수 있으며,

,

는 현재 입력 시간 단계의 가중치 메트릭(matrices)에 적용되는 파라미터일 수 있다. 또한, 위 첨자 s는 입력 시퀀스 정보를 나타내고, 아래 첨자 t는 시간 단계 정보를 나타낼 수 있다.

는 시그모이드 함수에 사용될 수 있으며,

는 요소별 곱셈을 뜻하는 Hadamard product 연산자일 수 있다. 이와 같은 구성에 의해, LSTM은 각 시간 단계의 반복 모듈 간의 상호작용을 통해 이전 단계에서 얻은 정보가 다음 단계에서도 지속되도록 할 수 있으며, 그에 따라, 긴 기간의 의존성(long-term dependencies) 문제를 해결할 수 있다.here,

,

May be a parameter used to control and monitor the information of the previous time step about the hidden state,

,

may be a parameter applied to weight metrics of the current input time step. In addition, superscript s may indicate input sequence information, and subscript t may indicate time step information.

can be used for the sigmoid function,

may be a Hadamard product operator meaning element-by-element multiplication. With this configuration, LSTM can allow the information obtained in the previous step to continue in the next step through the interaction between the iterative modules of each time step, thereby solving the problem of long-term dependencies. can

일 실시예에 따르면, 잔차 어텐션 기반의 LSTM(310)은 정규화 레이어(normalization layer)를 포함할 수 있다. 정규화 레이어에 의해 정보가 정규화 되는 경우, 잔차 어텐션 기반의 LSTM(310)의 훈련 시간이 감소될 수 있다. 예를 들어, 다음의 수학식 2에 의해 정규화가 수행될 수 있다.According to an embodiment, the LSTM 310 based on residual attention may include a normalization layer. When information is normalized by the normalization layer, the training time of the residual attention-based LSTM 310 can be reduced. For example, normalization may be performed by Equation 2 below.

여기서,

은 i번째 뉴런의 LSTM(310)의 각 레이어의 히든 상태일 수 있으며,

및

는 활성화 함수 f에 대한 입력 시퀀스를 리스케일(rescale)하기 위해 사용되는 훈련 가능한 가중치일 수 있다. 또한, t는 시간 단계(time step)를 나타낼 수 있다. 예를 들어, 오버 피팅을 감소시키기 위해, 잔차 어텐션 기반의 LSTM(310)의 각 레이어에 0.5의 탈락 임계값(dropout threshold)이 적용될 수 있다.here,

may be a hidden state of each layer of the LSTM 310 of the ith neuron,

and

may be the trainable weights used to rescale the input sequence for the activation function f. Also, t may represent a time step. For example, in order to reduce overfitting, a dropout threshold of 0.5 may be applied to each layer of the residual attention-based LSTM 310 .

일 실시예에 따르면, 잔차 어텐션 기반의 LSTM(310)은 셀프 어텐션 레이어(self-attention layer)와 연관될 수 있다. 이 경우, 셀프 어텐션 레이어는 특징 벡터(230)를 기초로 복수의 이미지 프레임의 연속적인 특징에 대한 상황 인식(context-aware) 벡터를 생성할 수 있다. 여기서, 상황 인식 벡터는 연속된 복수의 이미지 프레임의 순차적인 특징(예: 특징 간의 상관관계 등)을 포함하는 벡터일 수 있다. 이와 같은 셀프 어텐션 레이어를 이용함으로써, 잔차 어텐션 기반의 LSTM(310)은 복수의 입력이 요구되는 다른 모델과 달리, 특징 벡터(230)와 같은 하나의 입력만을 수신하여 효율적으로 이상 활동을 인식할 수 있다.According to an embodiment, the residual attention-based LSTM 310 may be associated with a self-attention layer. In this case, the self-attention layer may generate a context-aware vector for continuous features of a plurality of image frames based on the feature vector 230 . Here, the situational awareness vector may be a vector including sequential features (eg, correlation between features, etc.) of a plurality of consecutive image frames. By using such a self-attention layer, the residual attention-based LSTM 310 can efficiently recognize abnormal activity by receiving only one input such as the feature vector 230, unlike other models that require multiple inputs. there is.

일 실시예에 따르면, 잔차 어텐션 기반의 LSTM(310)은 이상 활동을 분류하기 위한 소프트 맥스 레이어(Softmax layer)(320)와 연관되고, 소프트 맥스 레이어(320)는 잔차 어텐션 기반의 LSTM(310)에 의해 출력된 정보를 기초로 이상 활동을 인식할 수 있다. 예를 들어, 잔차 어텐션 기반의 LSTM(310) 및/또는 소프트 맥스 레이어(320)는 다음의 수학식 3에 의해 이상 활동을 인식할 수 있다.According to one embodiment, the residual attention-based LSTM 310 is associated with a softmax layer 320 for classifying anomalous activity, and the softmax layer 320 is associated with the residual attention-based LSTM 310 Abnormal activity can be recognized based on the information output by For example, the residual attention-based LSTM 310 and/or the soft max layer 320 may recognize abnormal activity by Equation 3 below.

여기서,

,

는 어텐션 가중치

에 따른 프레임 특징

에 대하여 학습된 파라미터이고,

는 이상 활동과 연관된 스코어를 나타낼 수 있다. 또한,

는 소프트 맥스 레이어(320)로부터 획득된 이상 활동의 확률을 나타낼 수 있다. 다시 말해, 잔차 어텐션 기반의 LSTM(310)에 의해 추출된 복수의 이미지 프레임에 대한 정보는 해당 이미지 프레임이 이상 활동 및/또는 정상 활동을 포함하는지 여부를 식별하는데 사용될 수 있으며, 소프트 맥스 레이어(320)는 잔차 어텐션 기반의 LSTM(310)에 의해 출력된 정보를 기초로 최종적인 분류 및/또는 예측을 통해 이상 활동을 인식할 수 있다. 또한, 잔차 어텐션 기반의 LSTM(310)은 Adam 최적화(optimization) 함수 및 크로스 엔트로피(cross-entropy) 손실 함수와 연관될 수 있다. 즉, Adam 최적화 함수 및 크로스 엔트로피 손실 함수가 적용됨으로써, 이상 활동 인식 성능이 극대화될 수 있다.here,

,

is the attention weight

Frame characteristics according to

is a parameter learned for

may represent a score associated with an anomalous activity. also,

may represent the probability of an abnormal activity obtained from the soft max layer 320. In other words, information on a plurality of image frames extracted by the residual attention-based LSTM 310 may be used to identify whether the corresponding image frame includes abnormal activity and/or normal activity, and the soft max layer 320 ) may recognize abnormal activity through final classification and/or prediction based on the information output by the residual attention-based LSTM 310. In addition, the residual attention-based LSTM 310 may be associated with an Adam optimization function and a cross-entropy loss function. That is, by applying the Adam optimization function and the cross entropy loss function, abnormal activity recognition performance can be maximized.

도 4는 본 발명의 일 실시예에 따른 잔차 어텐션 기반의 LSTM의 특징 추출 과정(400)을 나타내는 예시적인 도면이다. 도시된 예에서,

,

은 각 시간 단계의 새로운 입력으로서, 복수의 이미지 프레임으로부터 추출된 특징 벡터의 적어도 일부를 나타낼 수 있다. 즉, 잔차 어텐션 기반의 LSTM은 반복적으로 실행되며, 각 시간 단계에서 새로운 입력을 수신하여 처리할 수 있다. 이 경우, 각 시간 단계에서의 LSTM은 서로 연관될 수 있으며, 이전 시간 단계의 정보 중 적어도 일부가 다음 시간 단계에서 이용될 수 있다.4 is an exemplary diagram illustrating a feature extraction process 400 of an LSTM based on residual attention according to an embodiment of the present invention. In the example shown,

,

is a new input at each time step, and may represent at least some of feature vectors extracted from a plurality of image frames. That is, the residual attention-based LSTM is repeatedly executed, and can receive and process new inputs at each time step. In this case, LSTMs at each time step may be correlated with each other, and at least some of the information of the previous time step may be used at the next time step.

일 실시예에 따르면, 잔차 어텐션 기반의 LSTM은 셀프 어텐션 레이어와 연관될 수 있다. 셀프 어텐션 레이어와 연관되는 경우, 잔차 어텐션 기반의 LSTM은 시간 단계의 길이가 길어진 경우에도 이전의 정보를 효과적으로 활용할 수 있으며, 잔차 어텐션 기반의 LSTM 및/또는 셀프 어텐션 레이어는 복수의 이미지 프레임에서 추출된 연속된 정보들 중 서로 연관성이 있는 요소들을 결정할 수 있다. 이와 같이 잔차 어텐션 기반의 LSTM에 의해 추출된 복수의 이미지 프레임의 이상 활동과 연관된 특징은 결합(concatenate) 및/또는 처리되어 소프트 맥스 레이어에 전달될 수 있다.According to an embodiment, an LSTM based on residual attention may be associated with a self-attention layer. When associated with the self-attention layer, the LSTM based on residual attention can effectively utilize previous information even when the length of the time step is long, and the LSTM based on residual attention and/or the self-attention layer can effectively utilize the information extracted from multiple image frames. Elements that are related to each other among consecutive pieces of information may be determined. In this way, features associated with abnormal activities of a plurality of image frames extracted by the residual attention-based LSTM may be concatenated and/or processed and transmitted to the softmax layer.

또한, LSTM은 잔차 기반의 LSTM일 수 있다. 여기서, 잔차(residual)는 최상위 레이어의 순차 정보를 특성화하고 입력 레이어로 주어진 잔차 함수를 발견하여 레이어를 재구성하는데 사용될 수 있다. 예를 들어, 잔차 함수 학습은 아래의 수학식 4에 의해 결정될 수 있다.Also, the LSTM may be a residual based LSTM. Here, the residual characterizes the sequential information of the top layer and can be used to reconstruct the layer by finding a residual function given as an input layer. For example, residual function learning may be determined by Equation 4 below.

여기서,

는 입력 데이터일 수 있으며,

는 레이어의 결과적인 순차 정보 벡터일 수 있다. 또한,

는 관련된 레이어에서 학습된 잔차를 나타낼 수 있다. 이러한 잔차 학습에 의해 레이어들 사이의 효과적인 훈련을 위한 숏컷(shortcut) 함수가 생성될 수 있다. 이와 같은 구성에 의해, 잔차를 이용함으로써 학습이 진행되면서 각 파라미터에 대한 가중치가 변형되는 기울기 소실(vanishing gradient) 문제가 효과적으로 해결될 수 있다.here,

can be input data,

may be the resulting sequential information vector of the layer. also,

may represent the residual learned in the relevant layer. A shortcut function for effective training between layers may be generated by such residual learning. With this configuration, a vanishing gradient problem in which weights for each parameter are modified while learning progresses can be effectively solved by using the residuals.

도 5는 본 개시의 일 실시예에 따른 잔차 어텐션 기반의 LSTM의 이상 활동에 대한 예측 결과를 나타내는 예시적인 혼동 행렬(confusion matrix)이다. 도시된 예에서, 잔차 어텐션 기반의 LSTM은 UCF-Crime 데이터셋을 이용하여, 복수의 이미지 프레임을 기초로 이상 활동(예: 이상 활동의 발생 여부 및/또는 발생된 이상 활동의 종류 등)을 예측할 수 있다. 도시된 바와 같이, 잔차 어텐션 기반의 LSTM은 복수의 이미지 프레임을 이용하여 공격(assaults), 폭발(explosion), 다툼(fighting), 정상(normal), 도로 사고(road accident) 등을 높은 정확도로 예측할 수 있다.5 is an exemplary confusion matrix showing prediction results for abnormal activity of LSTM based on residual attention according to an embodiment of the present disclosure. In the illustrated example, the residual attention-based LSTM predicts anomalous activity (eg whether an anomalous activity occurs and/or the type of anomalous activity that has occurred) based on a plurality of image frames using the UCF-Crime dataset. can As shown, the residual attention-based LSTM can predict attacks, explosions, fighting, normals, road accidents, etc. with high accuracy using a plurality of image frames. can

추가적으로, 잔차 어텐션 기반의 LSTM의 성능은 다른 종류의 LSTM보다 높게 측정될 수 있다. 이와 관련하여, LSTM, 양방향 LSTM, 잔차 LSTM 및 잔차 어텐션 기반의 LSTM 각각에 대한 성능이 평가되었다. 아래의 표 1과 같이 성능 평가는 UCF-Crime 데이터셋, UMN 및 Avenue 데이터셋을 이용하여 수행되었다.Additionally, the performance of LSTMs based on residual attention can be measured higher than other types of LSTMs. In this regard, the performance of each of LSTM, bidirectional LSTM, residual LSTM and LSTM based on residual attention was evaluated. As shown in Table 1 below, performance evaluation was performed using the UCF-Crime dataset, UMN and Avenue datasets.

모델Model 데이터셋dataset 리콜(%)Recall (%) 정확성(%)accuracy(%) F1 점수(%)F1 score (%) AUC(%)AUC (%) LSTMLSTM UCF-Crime 데이터셋UCF-Crime dataset 8686 7474 7777 8888 BD-LSTMBD-LSTM 7979 8484 7676 8787 잔차 LSTMResidual LSTM 9191 7878 8282 9595 잔차 어텐션 기반의 LSTMLSTM based on residual attention 7878 8787 8181 9696 LSTMLSTM UMNUMN 8787 7777 8181 8686 BD-LSTMBD-LSTM 8888 8181 8484 8888 잔차 LSTMResidual LSTM 9494 9595 9494 9696 잔차 어텐션 기반의 LSTMLSTM based on residual attention 9898 9898 9898 9898 LSTMLSTM AvenueAvenue 9191 9393 9292 9191 BD-LSTMBD-LSTM 9494 9595 9494 9494 잔차 LSTMResidual LSTM 9393 9494 9494 9494 잔차 어텐션 기반의 LSTMLSTM based on residual attention 9898 9999 9999 9898

표 1와 같이, 잔차 어텐션 기반의 LSTM이 종래의 다른 LSTM보다 대부분의 지표에서 높은 성능을 나타냈다. 또한, 아래의 표 2와 같이 종래의 다른 모델과 비교하여, 잔차 어텐션 기반의 LSTM의 효율성이 증가하였다.As shown in Table 1, the residual attention-based LSTM performed better in most indicators than other conventional LSTMs. In addition, as shown in Table 2 below, the efficiency of LSTM based on residual attention has increased compared to other conventional models.

모델Model 시간 복잡성(seconds)Time Complexity (seconds) 모델 사이즈(MB)Model size (MB) 파라미터
(Millions)parameter
(Millions) FLOPs(Mega)FLOPs (Mega) VGG-16VGG-16 -- 528528 138138 VGG-19VGG-19 -- 549549 143143 FlowNetFlowNet -- 638.5638.5 162.49162.49 DEAREStDEARESt -- 1187.51187.5 305.49305.49 잔차 어텐션 기반의 LSTMLSTM based on residual attention 0.2630.263 12.812.8 3.33.3 618.3618.3

즉, 표 2와 같이, 시간 복잡성, 모델 사이즈, 파라미터, FLOPs의 모든 측면에서 잔차 어텐션 기반의 LSTM이 높은 성능을 나타냈으며, 이에 따라, 잔차 어텐션 기반의 LSTM은 더 적은 컴퓨팅 파워로도 실시간으로 이상 활동을 인식할 수 있다. 또한, 아래의 표 3과 같이, 종래의 다른 모델과 비교하여, 잔차 어텐션 기반의 LSTM의 예측 정확도가 증가하였다.In other words, as shown in Table 2, the residual attention-based LSTM showed high performance in all aspects of time complexity, model size, parameters, and FLOPs. Accordingly, the residual attention-based LSTM can generate abnormalities in real time with less computing power. activity can be recognized. In addition, as shown in Table 3 below, the prediction accuracy of LSTM based on residual attention increased compared to other conventional models.

모델Model 정확성(%)accuracy(%) UCF-CrimeUCF-Crime UMNUMN AvenueAvenue VGG-16VGG-16 72.6672.66 -- -- VGG-19VGG-19 71.6671.66 -- -- FlowNetFlowNet 71.3371.33 -- -- DEAREStDEARESt 76.6676.66 -- -- Nandedkar and BansodNandedkar and Bansod -- 96.9996.99 -- Kyung Joo CheoiKyung Joo Cheoi -- 96.5096.50 90.1890.18 Al-Dhamari et al.Al-Dhamari et al. -- 97.4497.44 -- 잔차 어텐션 기반의 LSTMLSTM based on residual attention 78.4378.43 98.2098.20 98.8098.80

즉, 잔차 어텐션 기반의 LSTM은 더 적은 컴퓨팅 파워를 이용하여 종래의 다른 모델보다 더 높은 정확도로 이상 활동을 인식할 수 있다.That is, the residual attention-based LSTM can recognize abnormal activity with higher accuracy than other conventional models using less computing power.

도 6은 본 발명의 일 실시예에 따른 LSTM 기반의 이상 인식 방법(600)의 예시를 나타내는 흐름도이다. LSTM 기반의 이상 인식 방법(600)은 프로세서(예를 들어, 이상 인식 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 도시된 바와 같이, LSTM 기반의 이상 인식 방법(600)은 프로세서가 이상 인식의 판단 대상이 되는 연속된 복수의 이미지 프레임을 수신할 수 있다(S610).6 is a flowchart illustrating an example of an anomaly recognition method 600 based on LSTM according to an embodiment of the present invention. The LSTM-based anomaly recognition method 600 may be performed by a processor (eg, at least one processor of an anomaly recognition system). As shown, in the LSTM-based anomaly recognition method 600, a processor may receive a plurality of consecutive image frames to be judged for anomaly recognition (S610).

프로세서는 수신된 복수의 이미지 프레임을 학습된 경량 CNN 모델에 제공하여, 복수의 이미지 프레임의 특징을 나타내는 특징 벡터를 생성할 수 있다(S620). 여기서, 경량 CNN은 깊이 별 구별 가능한 컨볼루션 블록을 포함할 수 있으며, 컨볼루션 블록은 3 x 3 필터와 연관된 깊이 별 컨볼루션 레이어 및 깊이 별 컨볼루션 레이어에 의해 필터링된 값을 병합하기 위한 1 x 1 필터와 연관된 포인트 별 컨볼루션 레이어를 포함할 수 있다. 또한, 경량 CNN은 입력 데이터의 채널 수를 확장하기 위한 확장 레이어를 포함할 수 있다.The processor may generate a feature vector representing characteristics of the plurality of image frames by providing the received plurality of image frames to the learned lightweight CNN model (S620). Here, the lightweight CNN may include a distinguishable convolution block by depth, and the convolution block is a convolution layer by depth associated with a 3 x 3 filter and a 1 x 1 for merging the values filtered by the convolution layer by depth. 1 Convolution layer for each point associated with the filter may be included. In addition, the lightweight CNN may include an extension layer for extending the number of channels of input data.

프로세서는 생성된 특징 벡터를 잔차 어텐션 기반의 LSTM에 제공하여 복수의 이미지 프레임에 대응하는 이상 활동을 인식할 수 있다(S630). 예를 들어, 잔차 어텐션 기반의 LSTM은 훈련 시간을 감소시키기 위해 특징 벡터를 정규화하기 위한 정규화 레이어를 포함할 수 있다. 또한, 잔차 어텐션 기반의 LSTM은 셀프 어텐션 레이어(self-attention layer)와 연관될 수 있다. 여기서, 셀프 어텐션 레이어는 특징 벡터를 기초로 복수의 이미지 프레임의 연속적인 특징에 대한 상황 인식 벡터를 생성할 수 있다.The processor may recognize abnormal activities corresponding to a plurality of image frames by providing the generated feature vectors to the residual attention-based LSTM (S630). For example, an LSTM based on residual attention may include a normalization layer to normalize feature vectors to reduce training time. In addition, LSTM based on residual attention may be associated with a self-attention layer. Here, the self-attention layer may generate context vectors for continuous features of a plurality of image frames based on the feature vectors.

일 실시예에 따르면, 잔차 어텐션 기반의 LSTM은 이상 활동을 분류하기 위한 소프트 맥스 레이어와 연관될 수 있다. 이 경우, 프로세서는 생성된 특징 벡터를 잔차 어텐션 기반의 LSTM에 제공하여 출력된 출력 데이터를 소프트 맥스 레이어에 입력하여 이상 활동을 인식할 수 있다. 여기서, 잔차 어텐션 기반의 LSTM은 Adam 최적화 함수 및 크로스 엔트로피 손실 함수와 연관될 수 있다.According to one embodiment, the LSTM based on residual attention may be associated with a soft max layer for classifying abnormal activity. In this case, the processor may recognize the abnormal activity by providing the generated feature vector to the residual attention-based LSTM and inputting output data to the softmax layer. Here, the LSTM based on residual attention may be associated with an Adam optimization function and a cross entropy loss function.

도 7은 본 발명의 일 실시예에 따른 이상 인식 시스템(700)의 내부 구성을 나타내는 블록도이다. 이상 인식 시스템(700)은 메모리(710), 프로세서(720), 통신 모듈(730) 및 입출력 인터페이스(740)를 포함할 수 있다. 도 7에 도시된 바와 같이, 이상 인식 시스템(700)은 통신 모듈(730)을 이용하여 네트워크를 통해 정보 및/또는 데이터를 통신할 수 있도록 구성될 수 있다.7 is a block diagram showing the internal configuration of an abnormality recognition system 700 according to an embodiment of the present invention. The anomaly recognition system 700 may include a memory 710 , a processor 720 , a communication module 730 and an input/output interface 740 . As shown in FIG. 7 , the anomaly recognition system 700 may be configured to communicate information and/or data through a network using a communication module 730 .

메모리(710)는 비-일시적인 임의의 컴퓨터 판독 가능한 기록매체를 포함할 수 있다. 일 실시예에 따르면, 메모리(710)는 RAM(random access memory), ROM(read only memory), 디스크 드라이브, SSD(solid state drive), 플래시 메모리(flash memory) 등과 같은 비소멸성 대용량 저장 장치(permanent mass storage device)를 포함할 수 있다. 다른 예로서, ROM, SSD, 플래시 메모리, 디스크 드라이브 등과 같은 비소멸성 대용량 저장 장치는 메모리와는 구분되는 별도의 영구 저장 장치로서 이상 인식 시스템(700)에 포함될 수 있다. 또한, 메모리(710)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다.Memory 710 may include any non-transitory computer readable storage medium. According to one embodiment, the memory 710 is a non-perishable mass storage device (permanent mass storage device) such as random access memory (RAM), read only memory (ROM), disk drive, solid state drive (SSD), flash memory, and the like. mass storage device). As another example, a non-perishable mass storage device such as a ROM, SSD, flash memory, disk drive, etc. may be included in the anomaly recognition system 700 as a separate permanent storage device separate from memory. Also, an operating system and at least one program code may be stored in the memory 710 .

이러한 소프트웨어 구성요소들은 메모리(710)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 이러한 이상 인식 시스템(700)에 직접 연결가능한 기록 매체를 포함할 수 있는데, 예를 들어, 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 예로서, 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 모듈(730)을 통해 메모리(710)에 로딩될 수도 있다. 예를 들어, 적어도 하나의 프로그램은 개발자들 또는 어플리케이션의 설치 파일을 배포하는 파일 배포 시스템이 통신 모듈(730)을 통해 제공하는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 메모리(710)에 로딩될 수 있다.These software components may be loaded from a computer-readable recording medium separate from the memory 710 . Recording media readable by such a separate computer may include recording media directly connectable to the anomaly recognition system 700, for example, a floppy drive, disk, tape, DVD/CD-ROM drive, memory card, etc. It may include a computer-readable recording medium. As another example, software components may be loaded into the memory 710 through the communication module 730 rather than a computer-readable recording medium. For example, at least one program may be loaded into the memory 710 based on a computer program installed by files provided by developers or a file distribution system that distributes application installation files through the communication module 730. can

프로세서(720)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(710) 또는 통신 모듈(730)에 의해 사용자 단말(미도시) 또는 다른 외부 시스템으로 제공될 수 있다.The processor 720 may be configured to process commands of a computer program by performing basic arithmetic, logic, and input/output operations. Commands may be provided to a user terminal (not shown) or other external system by the memory 710 or the communication module 730 .

통신 모듈(730)은 네트워크를 통해 외부 장치 및/또는 사용자 단말(미도시)과 이상 인식 시스템(700)이 서로 통신하기 위한 구성 또는 기능을 제공할 수 있으며, 이상 인식 시스템(700)이 외부 시스템(일례로 별도의 클라우드 시스템 등)과 통신하기 위한 구성 또는 기능을 제공할 수 있다. 일례로, 이상 인식 시스템(700)의 프로세서(720)의 제어에 따라 제공되는 제어 신호, 명령, 데이터 등이 통신 모듈(730)과 네트워크를 거쳐 사용자 단말 및/또는 외부 시스템의 통신 모듈을 통해 사용자 단말 및/또는 외부 시스템으로 전송될 수 있다.The communication module 730 may provide a configuration or function for the anomaly recognition system 700 to communicate with an external device and/or a user terminal (not shown) through a network, and the anomaly recognition system 700 may communicate with an external system. (For example, a separate cloud system, etc.) may provide configuration or functionality for communication. For example, control signals, commands, data, etc. provided under the control of the processor 720 of the anomaly recognition system 700 pass through the communication module 730 and the network to the user terminal and/or the communication module of the external system. It may be transmitted to a terminal and/or an external system.

또한, 이상 인식 시스템(700)의 입출력 인터페이스(740)는 이상 인식 시스템(700)과 연결되거나 이상 인식 시스템(700)이 포함할 수 있는 입력 또는 출력을 위한 장치(미도시)와의 인터페이스를 위한 수단일 수 있다. 도 7에서는 입출력 인터페이스(740)가 프로세서(720)와 별도로 구성된 요소로서 도시되었으나, 이에 한정되지 않으며, 입출력 인터페이스(740)가 프로세서(720)에 포함되도록 구성될 수 있다. 이상 인식 시스템(700)은 도 7의 구성요소들보다 더 많은 구성요소들을 포함할 수 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다.In addition, the input/output interface 740 of the anomaly recognition system 700 is connected to the anomaly recognition system 700 or means for interface with a device (not shown) for input or output that the anomaly recognition system 700 may include. can be In FIG. 7 , the input/output interface 740 is illustrated as an element separately configured from the processor 720 , but is not limited thereto, and the input/output interface 740 may be included in the processor 720 . The anomaly recognition system 700 may include more components than those shown in FIG. 7 . However, there is no need to clearly show most of the prior art components.

이상 인식 시스템(700)의 프로세서(720)는 복수의 사용자 단말 및/또는 복수의 외부 시스템으로부터 수신된 정보 및/또는 데이터를 관리, 처리 및/또는 저장하도록 구성될 수 있다.The processor 720 of the anomaly recognition system 700 may be configured to manage, process, and/or store information and/or data received from a plurality of user terminals and/or a plurality of external systems.

상술된 방법 및/또는 다양한 실시예들은, 디지털 전자 회로, 컴퓨터 하드웨어, 펌웨어, 소프트웨어 및/또는 이들의 조합으로 실현될 수 있다. 본 발명의 다양한 실시예들은 데이터 처리 장치, 예를 들어, 프로그래밍 가능한 하나 이상의 프로세서 및/또는 하나 이상의 컴퓨팅 장치에 의해 실행되거나, 컴퓨터 판독 가능한 기록 매체 및/또는 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 상술된 컴퓨터 프로그램은 컴파일된 언어 또는 해석된 언어를 포함하여 임의의 형태의 프로그래밍 언어로 작성될 수 있으며, 독립 실행형 프로그램, 모듈, 서브 루틴 등의 임의의 형태로 배포될 수 있다. 컴퓨터 프로그램은 하나의 컴퓨팅 장치, 동일한 네트워크를 통해 연결된 복수의 컴퓨팅 장치 및/또는 복수의 상이한 네트워크를 통해 연결되도록 분산된 복수의 컴퓨팅 장치를 통해 배포될 수 있다.The above-described methods and/or various embodiments may be realized with digital electronic circuits, computer hardware, firmware, software, and/or combinations thereof. Various embodiments of the present invention may be performed by a data processing device, eg, one or more programmable processors and/or one or more computing devices, or as a computer readable recording medium and/or a computer program stored on a computer readable recording medium. can be implemented The above-described computer programs may be written in any form of programming language, including compiled or interpreted languages, and may be distributed in any form, such as a stand-alone program, module, or subroutine. A computer program may be distributed over one computing device, multiple computing devices connected through the same network, and/or distributed over multiple computing devices connected through multiple different networks.

상술된 방법 및/또는 다양한 실시예들은, 입력 데이터를 기초로 동작하거나 출력 데이터를 생성함으로써, 임의의 기능, 함수 등을 처리, 저장 및/또는 관리하는 하나 이상의 컴퓨터 프로그램을 실행하도록 구성된 하나 이상의 프로세서에 의해 수행될 수 있다. 예를 들어, 본 발명의 방법 및/또는 다양한 실시예는 FPGA(Field Programmable Gate Array) 또는 ASIC(Application Specific Integrated Circuit)과 같은 특수 목적 논리 회로에 의해 수행될 수 있으며, 본 발명의 방법 및/또는 실시예들을 수행하기 위한 장치 및/또는 시스템은 FPGA 또는 ASIC와 같은 특수 목적 논리 회로로서 구현될 수 있다.The methods and/or various embodiments described above may be performed by one or more processors configured to execute one or more computer programs that process, store, and/or manage any function, function, or the like, by operating on input data or generating output data. can be performed by For example, the method and/or various embodiments of the present invention may be performed by a special purpose logic circuit such as a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), and the method and/or various embodiments of the present invention may be performed. Apparatus and/or systems for performing the embodiments may be implemented as special purpose logic circuits such as FPGAs or ASICs.

컴퓨터 프로그램을 실행하는 하나 이상의 프로세서는, 범용 목적 또는 특수 목적의 마이크로 프로세서 및/또는 임의의 종류의 디지털 컴퓨팅 장치의 하나 이상의 프로세서를 포함할 수 있다. 프로세서는 읽기 전용 메모리, 랜덤 액세스 메모리의 각각으로부터 명령 및/또는 데이터를 수신하거나, 읽기 전용 메모리와 랜덤 액세스 메모리로부터 명령 및/또는 데이터를 수신할 수 있다. 본 발명에서, 방법 및/또는 실시예들을 수행하는 컴퓨팅 장치의 구성 요소들은 명령어들을 실행하기 위한 하나 이상의 프로세서, 명령어들 및/또는 데이터를 저장하기 위한 하나 이상의 메모리 디바이스를 포함할 수 있다.The one or more processors executing the computer program may include a general purpose or special purpose microprocessor and/or one or more processors of any kind of digital computing device. The processor may receive instructions and/or data from each of the read-only memory and the random access memory, or receive instructions and/or data from the read-only memory and the random access memory. In the present invention, components of a computing device performing methods and/or embodiments may include one or more processors for executing instructions, and one or more memory devices for storing instructions and/or data.

일 실시예에 따르면, 컴퓨팅 장치는 데이터를 저장하기 위한 하나 이상의 대용량 저장 장치와 데이터를 주고받을 수 있다. 예를 들어, 컴퓨팅 장치는 자기 디스크(magnetic disc) 또는 광 디스크(optical disc)로부터 데이터를 수신하거나/수신하고, 자기 디스크 또는 광 디스크로 데이터를 전송할 수 있다. 컴퓨터 프로그램과 연관된 명령어들 및/또는 데이터를 저장하기에 적합한 컴퓨터 판독 가능한 저장 매체는, EPROM(Erasable Programmable Read-Only Memory), EEPROM(Electrically Erasable PROM), 플래시 메모리 장치 등의 반도체 메모리 장치를 포함하는 임의의 형태의 비 휘발성 메모리를 포함할 수 있으나, 이에 한정되지 않는다. 예를 들어, 컴퓨터 판독 가능한 저장 매체는 내부 하드 디스크 또는 이동식 디스크와 같은 자기 디스크, 광 자기 디스크, CD-ROM 및 DVD-ROM 디스크를 포함할 수 있다.According to one embodiment, a computing device may exchange data with one or more mass storage devices for storing data. For example, a computing device may receive/receive data from and transfer data to a magnetic or optical disc. A computer-readable storage medium suitable for storing instructions and/or data associated with a computer program includes semiconductor memory devices such as Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable PROM (EEPROM), and flash memory devices. Any type of non-volatile memory may be included, but is not limited thereto. For example, computer readable storage media may include magnetic disks such as internal hard disks or removable disks, magneto-optical disks, CD-ROM and DVD-ROM disks.

사용자와의 상호 작용을 제공하기 위해, 컴퓨팅 장치는 정보를 사용자에게 제공하거나 디스플레이하기 위한 디스플레이 장치(예를 들어, CRT (Cathode Ray Tube), LCD(Liquid Crystal Display) 등) 및 사용자가 컴퓨팅 장치 상에 입력 및/또는 명령 등을 제공할 수 있는 포인팅 장치(예를 들어, 키보드, 마우스, 트랙볼 등)를 포함할 수 있으나, 이에 한정되지 않는다. 즉, 컴퓨팅 장치는 사용자와의 상호 작용을 제공하기 위한 임의의 다른 종류의 장치들을 더 포함할 수 있다. 예를 들어, 컴퓨팅 장치는 사용자와의 상호 작용을 위해, 시각적 피드백, 청각 피드백 및/또는 촉각 피드백 등을 포함하는 임의의 형태의 감각 피트백을 사용자에게 제공할 수 있다. 이에 대해, 사용자는 시각, 음성, 동작 등의 다양한 제스처를 통해 컴퓨팅 장치로 입력을 제공할 수 있다.To provide interaction with a user, a computing device includes a display device (eg, a cathode ray tube (CRT), a liquid crystal display (LCD), etc.) It may include a pointing device (eg, a keyboard, mouse, trackball, etc.) capable of providing input and/or commands to, but is not limited thereto. That is, the computing device may further include any other type of device for providing interaction with a user. For example, a computing device may provide any form of sensory feedback to a user for interaction with the user, including visual feedback, auditory feedback, and/or tactile feedback. In this regard, the user may provide input to the computing device through various gestures such as visual, voice, and motion.

본 발명에서, 다양한 실시예들은 백엔드 구성 요소(예: 데이터 서버), 미들웨어 구성 요소(예: 애플리케이션 서버) 및/또는 프론트 엔드 구성 요소를 포함하는 컴퓨팅 시스템에서 구현될 수 있다. 이 경우, 구성 요소들은 통신 네트워크와 같은 디지털 데이터 통신의 임의의 형태 또는 매체에 의해 상호 연결될 수 있다. 예를 들어, 통신 네트워크는 LAN(Local Area Network), WAN(Wide Area Network) 등을 포함할 수 있다.In the present invention, various embodiments may be implemented in a computing system including a back-end component (eg, a data server), a middleware component (eg, an application server), and/or a front-end component. In this case, the components may be interconnected by any form or medium of digital data communication, such as a communication network. For example, the communication network may include a local area network (LAN), a wide area network (WAN), and the like.

본 명세서에서 기술된 예시적인 실시예들에 기반한 컴퓨팅 장치는, 사용자 디바이스, 사용자 인터페이스(UI) 디바이스, 사용자 단말 또는 클라이언트 디바이스를 포함하여 사용자와 상호 작용하도록 구성된 하드웨어 및/또는 소프트웨어를 사용하여 구현될 수 있다. 예를 들어, 컴퓨팅 장치는 랩톱(laptop) 컴퓨터와 같은 휴대용 컴퓨팅 장치를 포함할 수 있다. 추가적으로 또는 대안적으로, 컴퓨팅 장치는, PDA(Personal Digital Assistants), 태블릿 PC, 게임 콘솔(game console), 웨어러블 디바이스(wearable device), IoT(internet of things) 디바이스, VR(virtual reality) 디바이스, AR(augmented reality) 디바이스 등을 포함할 수 있으나, 이에 한정되지 않는다. 컴퓨팅 장치는 사용자와 상호 작용하도록 구성된 다른 유형의 장치를 더 포함할 수 있다. 또한, 컴퓨팅 장치는 이동 통신 네트워크 등의 네트워크를 통한 무선 통신에 적합한 휴대용 통신 디바이스(예를 들어, 이동 전화, 스마트 전화, 무선 셀룰러 전화 등) 등을 포함할 수 있다. 컴퓨팅 장치는, 무선 주파수(RF; Radio Frequency), 마이크로파 주파수(MWF; Microwave Frequency) 및/또는 적외선 주파수(IRF; Infrared Ray Frequency)와 같은 무선 통신 기술들 및/또는 프로토콜들을 사용하여 네트워크 서버와 무선으로 통신하도록 구성될 수 있다.A computing device based on the example embodiments described herein may be implemented using hardware and/or software configured to interact with a user, including a user device, user interface (UI) device, user terminal, or client device. can For example, the computing device may include a portable computing device such as a laptop computer. Additionally or alternatively, the computing device may include personal digital assistants (PDAs), tablet PCs, game consoles, wearable devices, internet of things (IoT) devices, virtual reality (VR) devices, AR (augmented reality) device, etc. may be included, but is not limited thereto. A computing device may further include other types of devices configured to interact with a user. Further, the computing device may include a portable communication device (eg, a mobile phone, smart phone, wireless cellular phone, etc.) suitable for wireless communication over a network, such as a mobile communication network. A computing device communicates wirelessly with a network server using wireless communication technologies and/or protocols such as radio frequency (RF), microwave frequency (MWF) and/or infrared ray frequency (IRF). It can be configured to communicate with.

본 발명에서 특정 구조적 및 기능적 세부 사항을 포함하는 다양한 실시예들은 예시적인 것이다. 따라서, 본 발명의 실시예들은 상술된 것으로 한정되지 않으며, 여러 가지 다른 형태로 구현될 수 있다. 또한, 본 발명에서 사용된 용어는 일부 실시예를 설명하기 위한 것이며 실시예를 제한하는 것으로 해석되지 않는다. 예를 들어, 단수형 단어 및 상기는 문맥상 달리 명확하게 나타내지 않는 한 복수형도 포함하는 것으로 해석될 수 있다.The various embodiments herein, including specific structural and functional details, are exemplary. Accordingly, embodiments of the present invention are not limited to those described above and may be implemented in various other forms. In addition, terms used in the present invention are for describing some embodiments and are not construed as limiting the embodiments. For example, the singular and the above may be construed to include the plural as well, unless the context clearly dictates otherwise.

본 발명에서, 달리 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함하여 본 명세서에서 사용되는 모든 용어는 이러한 개념이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. 또한, 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 맥락에서의 의미와 일치하는 의미를 갖는 것으로 해석되어야 한다.In the present invention, unless defined otherwise, all terms used in this specification, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art to which such concept belongs. . In addition, terms commonly used, such as terms defined in a dictionary, should be interpreted as having a meaning consistent with the meaning in the context of the related technology.

본 명세서에서는 본 발명이 일부 실시예들과 관련하여 설명되었지만, 본 발명의 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 발명의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다. 또한, 그러한 변형 및 변경은 본 명세서에 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다.Although the present invention has been described in relation to some embodiments in this specification, various modifications and changes can be made without departing from the scope of the present invention that can be understood by those skilled in the art. Moreover, such modifications and variations are intended to fall within the scope of the claims appended hereto.

100: 이상 인식 시스템
102: 경량 CNN
104: 잔차 어텐션 기반의 LSTM
110: 이미지 프레임
120: 특징 벡터
130: 이상 활동 인식100: anomaly recognition system
102: lightweight CNN
104: LSTM based on residual attention
110: image frame
120: feature vector
130: Anomalous Activity Recognition

Claims

An LSTM-based anomaly recognition method performed by at least one processor,
Receiving a plurality of continuous image frames that are subject to determination of abnormality recognition;
providing the received plurality of image frames to a trained lightweight convolutional neural network (CNN) model to generate feature vectors representing characteristics of the plurality of image frames; and
recognizing an abnormal activity corresponding to the plurality of image frames by providing the generated feature vector to a residual attention-based long short-term memory (LSTM);
Including, LSTM-based anomaly recognition method.

According to claim 1,
The lightweight CNN includes convolutional blocks that can be distinguished depth-wise,
The convolution block is a depth-wise convolution layer associated with a 3 × 3 filter and a point-wise associated with a 1 × 1 filter for merging values filtered by the depth-wise convolution layer. -wise) LSTM-based anomaly recognition method including a convolutional layer.

According to claim 1,
The lightweight CNN includes an expansion layer for extending the number of channels of input data, an LSTM-based anomaly recognition method.

According to claim 1,
Wherein the residual attention-based LSTM includes a normalization layer for normalizing the feature vector to reduce training time.

According to claim 1,
The residual attention-based LSTM is associated with a self-attention layer,
wherein the self-attention layer generates a context-aware vector for continuous features of the plurality of image frames based on the feature vector.

According to claim 1,
The residual attention-based LSTM is associated with a Softmax layer for classifying abnormal activity,
Recognizing the abnormal activity,
recognizing the abnormal activity by providing the generated feature vector to the residual attention-based LSTM and inputting output data to the softmax layer;
Including, LSTM-based anomaly recognition method.

According to claim 1,
The residual attention-based LSTM is associated with an Adam optimization function and a cross-entropy loss function, an LSTM-based anomaly recognition method.

A computer program stored in a computer-readable recording medium to execute the LSTM-based anomaly recognition method according to any one of claims 1 to 7 on a computer.

As an anomaly recognition system,
communication module;
Memory; and
at least one processor connected to the memory and configured to execute at least one computer readable program contained in the memory
including,
The at least one program,
Receiving a plurality of continuous image frames to be judged for abnormality recognition;
Providing the received plurality of image frames to a learned lightweight CNN model to generate feature vectors representing characteristics of the plurality of image frames;
and instructions for recognizing an abnormal activity corresponding to the plurality of image frames by providing the generated feature vector to a residual attention-based LSTM.