KR102596738B1

KR102596738B1 - An Autonomous Decision-Making Framework for Gait Recognition Systems Against Adversarial Attack Using Reinforcement Learning

Info

Publication number: KR102596738B1
Application number: KR1020220182553A
Authority: KR
Inventors: 노승민; 여상수; 맥수드 무아잠; 야스민 사다프; 메흐무드 어판; 아딜 파르한
Original assignee: 중앙대학교 산학협력단
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-10-31

Abstract

본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 방법, 이를 수행하는 장치 및 컴퓨터 프로그램은, 딥러닝(deep learning) 기반의 보행 인식 시스템(gait recognition system)에 대해, 강화 학습(reinforcement learning)을 이용하여 적대적 공격(adversarial attack)을 수행함으로써, 보행 인식 시스템의 취약성 등을 분석하여 보행 인식 시스템의 안전성을 보다 정확하게 검증할 수 있다.An adversarial attack method using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention, a device and a computer program for performing the same, strengthen a gait recognition system based on deep learning. By performing an adversarial attack using reinforcement learning, the safety of the gait recognition system can be more accurately verified by analyzing vulnerabilities of the gait recognition system.

Description

An Autonomous Decision-Making Framework for Gait Recognition Systems Against Adversarial Attack Using Reinforcement Learning}

본 발명은 강화학습을 이용한 적대적 공격에 대한 보행인식 시스템의 자율적 의사결정 프레임워크에 관한 것으로서, 더욱 상세하게는 딥러닝(deep learning) 기반의 보행 인식 시스템(gait recognition system)의 안전성을 검증하기 위해, 보행 인식 시스템에 대한 공격을 수행하는, 방법, 장치 및 컴퓨터 프로그램에 관한 것이다.The present invention relates to an autonomous decision-making framework for a gait recognition system against adversarial attacks using reinforcement learning, and more specifically, to verify the safety of a deep learning-based gait recognition system. , relates to a method, device, and computer program for performing attacks on gait recognition systems.

지난 수십 년 동안 인공지능(artificial intelligence, AI), 머신 러닝(machine learning, ML) 및 딥러닝(deep learning, DL)의 기술 발전으로 거의 모든 영역에서 자율 시스템의 사용이 확대되고 있다. 또한, 이러한 심층 신경망(deep neural network, DNN)에 대한 보안 요소를 잘 조사해야 하며, 적대적 공격(adversarial attack)에서 이러한 심층 신경망(DNN)의 안정성을 결정하는 것이 점점 더 중요해지고 있다.Over the past few decades, technological advances in artificial intelligence (AI), machine learning (ML), and deep learning (DL) have expanded the use of autonomous systems in almost all areas. Additionally, the security factors for these deep neural networks (DNNs) must be well investigated, and it is becoming increasingly important to determine the stability of these deep neural networks (DNNs) under adversarial attacks.

신경망이 적대적 공격에 취약하다는 초기 관찰 이후 공격 측면에서 인공지능(AI) 발전을 조사하는 것이 점차 초점이 되었으며, 연구자들은 다양한 시스템의 약점과 견고성을 악용하기 위해 공격 및 방어 방법의 새로운 변종을 설계하려고 시도하고 있다. 보다 정확하게는, 이러한 연구에서 잠재적으로 조작된 데이터 샘플에 대한 모델의 복원력이 중요한 설계 목표가 되었다. 훈련된 모델은 주어진 알려지지 않은 샘플을 분류하는 데 매우 우수하고 자신이 있으며, 최근 연구에 따르면 공격자는 이러한 훈련된 모델이 잘못된 결과를 생성하도록 입력 샘플을 자주 변경할 수 있다. 이 주장은 이미지의 변경(alteration)이나 교란(perturbation)이 사람이 알아차리지 못하고 모델이 잘못된 클래스를 예측하는 경우에도 유효하다. 보안 고려 사항을 제외하고, 이 프로세스는 설계된 딥러닝(DL) 알고리즘이 기본 전제를 간결하게 파악하지 못하고 있음을 보여준다. 공격은 블랙 박스 공격(black box attack)과 화이트 박스 공격(white box attack), 표적 공격(targeted attack)과 비표적 공격(non-targeted attack)으로 분류된다.Since the initial observation that neural networks were vulnerable to adversarial attacks, examining advances in artificial intelligence (AI) from the attack perspective has become an increasingly focused focus, with researchers attempting to design new variants of attack and defense methods to exploit the weaknesses and robustness of various systems. I'm trying. More precisely, in these studies, the resilience of models to potentially manipulated data samples has become an important design goal. Trained models are very good and confident at classifying given unknown samples, and recent studies have shown that attackers can frequently change input samples such that these trained models produce incorrect results. This argument is valid even when alterations or perturbations in the image are unnoticed by humans and the model predicts the wrong class. Security considerations aside, this process shows that the deep learning (DL) algorithms for which they are designed do not concisely capture the underlying assumptions. Attacks are classified into black box attacks, white box attacks, targeted attacks, and non-targeted attacks.

화이트 박스 공격에서, 잠재적으로 유해한 샘플은 모델 파라미터와 구조 또는 아키텍처에 액세스하여 계산된다. 상대는 주어진 샘플에 대한 모델 기울기를 계산하여 추출한 정보를 사용하여 계산된다. 화이트 박스 공격의 예로는 FGSM(fast gradient sign method), 모멘텀 반복 FGSM 및 반복 FGSM이 있다. 마찬가지로 FGSM 방법의 몇 가지 새로운 변형도 설계되었고, 원-스텝 타겟 클래스 방법(one-step target class method), 기본 반복 방법(basic iterative method) 및 반복 가능성이 가장 낮은 클래스 방법(iterative-least likely class method) 등이 있다. 또한, 기울기 정보뿐만 아니라 기본 모델의 Jacobian 행렬도 계산되어 적대적 샘플을 생성한다. 또한, 주어진 입력 샘플에 대한 최소 크기의 교란(perturbation)을 계산하여 연속적인 반복 집합을 계산한다. 이 과정은 정상 표본에 더 가까운 의사 결정 경계(decision boundary)를 찾을 수 없을 때까지 계속 진행되어 경계 너머의 적을 찾는다.In white box attacks, potentially harmful samples are computed by accessing model parameters and structure or architecture. The relative is calculated using information extracted by calculating the model slope for a given sample. Examples of white box attacks include fast gradient sign method (FGSM), momentum iterative FGSM, and iterative FGSM. Likewise, several new variants of the FGSM method were also designed, including the one-step target class method, the basic iterative method and the iterative-least likely class method. ), etc. Additionally, not only the gradient information but also the Jacobian matrix of the base model is calculated to generate adversarial samples. Additionally, a set of consecutive repetitions is calculated by calculating the minimum size of perturbation for a given input sample. This process continues until no decision boundary is found that is closer to the normal sample, and then the enemy beyond the boundary is found.

반대로, 블랙 박스 공격은 더 강력하고 수행하기 어렵다. 일반적으로 블랙 박스 시나리오에서 적대적 샘플은 기본 모델의 파라미터, 기울기 및 구조에 대한 정보를 수집하지 않고 계산된다. 예를 들어, 힌지-라이크 손실 함수(hinge-like loss function)와 대칭 차이(symmetric difference)는 적대적 샘플을 계산하기 위해 사용된다. 또한, 미분 진화 알고리즘(differential evolution algorithm)에 기반한 최적화 문제를 사용하여 1과 같은 최적 크기 교란(perturbation)을 결정한다. 따라서, 공격을 1-픽셀 공격이라고 한다. 유사하게, 자연 생성 적대적 네트워크는 또한 내부 계층의 표현 사이의 거리가 줄어드는 적대적 샘플을 생성하는 데 이용된다.Conversely, black box attacks are more powerful and difficult to perform. Typically, in black box scenarios, adversarial samples are computed without collecting information about the parameters, gradients, and structure of the underlying model. For example, hinge-like loss function and symmetric difference are used to compute adversarial samples. Additionally, an optimization problem based on a differential evolution algorithm is used to determine the optimal perturbation size equal to 1. Therefore, the attack is called a 1-pixel attack. Similarly, naturally occurring adversarial networks are also used to generate adversarial samples where the distance between representations of internal layers is reduced.

결과적으로, 이러한 공격은 표적 및 비표적 공격과 같은 다양한 설정에서 모델링될 수 있다. 일반적으로, 표적 공격에서, 공격자는 모델이 원하는 클래스를 출력하도록 적대적 샘플에 대한 입력 샘플을 만든다. 대조적으로, 비표적 공격에서, 모델은 훈련된 클래스 집합에서 잘못된 임의 클래스를 예측한다. 기존 연구에 따르면 표적형 또는 비표적형 방식으로 수행되는 블랙 박스 공격이 더 자연스럽고 실용적으로 보인다. 공격자는 일반적으로 안전 및 감시를 위해 설계된 인공지능(AI) 기반 시스템의 기본 모델 아키텍처 및 파라미터에 대한 통찰력이 부족하다. 결과적으로, 화이트 박스 공격은 이러한 시스템에서 덜 실용적이고 덜 성공적이다. 여러 연구자들은 다양한 인공지능(AI), 딥러닝(DL) 및 컴퓨터 비전 관련 사용 사례 및 문제의 취약성을 악용하기 위해 많은 연구를 제안했다. 이러한 조사는 실제 응용 프로그램, 특히 안전에 중요한 시스템에 배포하기 전에 적대적인 공격을 받을 때, 환경에서 딥러닝(DL) 모델의 견고성을 제공한다.As a result, these attacks can be modeled in various settings, such as targeted and untargeted attacks. Typically, in a targeted attack, the attacker creates input samples for the adversarial sample such that the model outputs the desired class. In contrast, in an untargeted attack, the model predicts an incorrect random class from the set of trained classes. According to existing research, black box attacks performed in a targeted or non-targeted manner seem more natural and practical. Attackers typically lack insight into the underlying model architecture and parameters of artificial intelligence (AI)-based systems designed for security and surveillance. As a result, white box attacks are less practical and successful on these systems. Several researchers have proposed many studies to exploit vulnerabilities in various artificial intelligence (AI), deep learning (DL), and computer vision-related use cases and problems. These investigations provide robustness of deep learning (DL) models in environments where they are subject to adversarial attacks prior to deployment in real-world applications, especially safety-critical systems.

본 발명이 이루고자 하는 목적은, 딥러닝(deep learning) 기반의 보행 인식 시스템(gait recognition system)에 대해, 강화 학습(reinforcement learning)을 이용하여 적대적 공격(adversarial attack)을 수행하는, 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 방법, 이를 수행하는 장치 및 컴퓨터 프로그램을 제공하는 데 있다.The purpose of the present invention is to provide a gait recognition system that performs an adversarial attack using reinforcement learning on a deep learning-based gait recognition system. The goal is to provide an adversarial attack method using reinforcement learning, a device and a computer program to perform it.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Other unspecified objects of the present invention can be additionally considered within the scope that can be easily inferred from the following detailed description and its effects.

상기의 기술적 과제를 달성하기 위한 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 방법은, 분석 대상인 딥러닝(deep learning) 기반의 보행 인식 시스템(gait recognition system)을 획득하는 단계; 및 강화 학습(reinforcement learning, RL)을 이용하여 블랙 박스 공격(black box attack)인 적대적 공격(adversarial attack)을 상기 보행 인식 시스템에 대해 수행하는 단계;를 포함한다.An adversarial attack method using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention to achieve the above technical problem acquires a gait recognition system based on deep learning, which is the subject of analysis. steps; and performing an adversarial attack, which is a black box attack, on the gait recognition system using reinforcement learning (RL).

여기서, 상기 보행 인식 시스템은, 보행 에너지 이미지(gait energy image, GEI) 기반의 보행 표현(gait representation)을 이용하여 보행 스타일로 개인을 인식하는 심층 합성곱 신경망(deep convolutional neural network, DCNN) 기반의 지능형 모델(intelligent model)일 수 있다.Here, the gait recognition system is based on a deep convolutional neural network (DCNN) that recognizes an individual by gait style using a gait representation based on a gait energy image (GEI). It may be an intelligent model.

여기서, 상기 적대적 공격 수행 단계는, 에이전트(agent)가 상호 작용하는 보행 에너지 이미지(GEI)가 환경(environment)이고, 보행 에너지 이미지(GEI)에서의 픽셀 위치(pixel location)가 상태(state)이며, 상기 에이전트가 현재 상태에서 가능한 모든 이동이 액션(action)이고, 상기 지능형 모델로부터 제공받은 예측 결과를 토대로 결정된 액션 품질 값이 보상(reward)인 상기 강화 학습(RL)을 이용하여, 상기 보행 인식 시스템에 대해 상기 적대적 공격을 수행하는 것으로 이루어질 수 있다.Here, in the hostile attack performance step, the Gait Energy Image (GEI) with which the agent interacts is the environment, and the pixel location in the Gait Energy Image (GEI) is the state. , All possible movements of the agent in its current state are actions, and the action quality value determined based on the prediction result provided from the intelligent model is the reward, using the reinforcement learning (RL), to recognize the gait. This can be achieved by performing the above adversarial attack against the system.

여기서, 상기 액션은, 상기 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 위쪽에 있는 픽셀로 이동, 상기 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 아래쪽에 있는 픽셀로 이동, 상기 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 왼쪽에 있는 픽셀로 이동, 및 상기 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 오른쪽에 있는 픽셀로 이동을 포함할 수 있다.Here, the action is: the agent moves to the pixel above based on the pixel location according to the current state, the agent moves to the pixel below based on the pixel location according to the current state, and the agent moves to the pixel below based on the pixel location according to the current state. This may include moving to the pixel on the left based on the pixel position according to the agent's current state, and moving to the pixel on the right based on the pixel position according to the agent's current state.

여기서, 상기 액션은, 상기 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 미리 설정된 크기의 스텝(step)을 단위로 다른 픽셀로 이동하는 것일 수 있다.Here, the action may be that the agent moves to another pixel in steps of a preset size based on the pixel position according to the current state.

여기서, 상기 적대적 공격 수행 단계는, Q-러닝(Q-learning)을 기반으로 상기 보행 인식 시스템에 대해 상기 적대적 공격을 수행하는 것으로 이루어질 수 있다.Here, the step of performing the adversarial attack may consist of performing the adversarial attack on the gait recognition system based on Q-learning.

여기서, 상기 적대적 공격 수행 단계는, 상기 에이전트가 수행한 상기 액션을 기반으로 미리 설정된 크기의 적대적 패치(adversarial patch)를 생성하고, 상기 적대적 패치가 추가된 적대적 보행 에너지 이미지(GEI)를 상기 지능형 모델에 제공하며, 상기 지능형 모델로부터 제공받은 상기 적대적 보행 에너지 이미지(GEI)에 대한 예측 결과를 토대로 상기 액션에 대한 상기 보상을 결정하는 과정을 통해, 상기 보행 인식 시스템에 대해 상기 적대적 공격을 수행하는 것으로 이루어질 수 있다.Here, the adversarial attack performance step generates an adversarial patch of a preset size based on the action performed by the agent, and converts an adversarial patch to which the adversarial patch is added into the intelligent model. and performing the hostile attack on the gait recognition system through a process of determining the reward for the action based on the prediction result for the hostile gait energy image (GEI) provided from the intelligent model. It can be done.

여기서, 상기 적대적 공격 수행 단계는, 상기 에이전트가 수행한 상기 액션에 따른 상기 에이전트의 상기 상태인 픽셀 위치를 중심으로 하여 픽셀 값이 랜덤하게 생성된 n×n 크기를 가지는 상기 적대적 패치를 생성하고, 상기 적대적 패치의 픽셀 위치를 토대로 보행 에너지 이미지(GEI)에 상기 적대적 패치를 추가하여 픽셀이 교란(perturbation)된 상기 적대적 보행 에너지 이미지(GEI)를 획득하는 것으로 이루어질 수 있다.Here, the hostile attack performance step generates the hostile patch having a size of n × n in which pixel values are randomly generated centering on the pixel location of the state of the agent according to the action performed by the agent, The adversarial patch may be added to the GEI based on the pixel location of the adversarial patch to obtain the adversarial GEI in which pixels are perturbed.

여기서, 상기 적대적 공격 수행 단계는, 상기 지능형 모델로부터 제공받은 상기 적대적 보행 에너지 이미지(GEI)에 대한 예측 결과가 상기 지능형 모델의 신뢰도(confidence)를 낮추는 결과에 해당하면 양의 상기 액션 품질 값을 상기 액션에 대한 상기 보상으로 결정하고, 상기 지능형 모델로부터 제공받은 상기 적대적 보행 에너지 이미지(GEI)에 대한 예측 결과가 상기 지능형 모델의 신뢰도를 낮추는 결과에 해당하지 않으면 음의 상기 액션 품질 값을 상기 액션에 대한 상기 보상으로 결정하는 것으로 이루어질 수 있다.Here, in the adversarial attack performance step, if the prediction result for the adversarial gait energy image (GEI) provided from the intelligent model corresponds to a result that lowers the confidence of the intelligent model, the action quality value is positive. It is determined as the reward for the action, and if the prediction result for the hostile gait energy image (GEI) provided from the intelligent model does not correspond to a result that lowers the reliability of the intelligent model, a negative action quality value is applied to the action. This can be done by deciding on the compensation for the above.

상기의 기술적 과제를 달성하기 위한 본 발명의 바람직한 실시예에 따른 컴퓨터 프로그램은 컴퓨터 판독 가능한 저장 매체에 저장되어 상기한 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 방법 중 어느 하나를 컴퓨터에서 실행시킨다.A computer program according to a preferred embodiment of the present invention for achieving the above technical problem is stored in a computer-readable storage medium and executes one of the adversarial attack methods using reinforcement learning on the gait recognition system described above on the computer.

상기의 기술적 과제를 달성하기 위한 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 장치는, 딥러닝(deep learning) 기반의 보행 인식 시스템(gait recognition system)에 대해, 강화 학습(reinforcement learning, RL)을 이용하여 적대적 공격(adversarial attack)을 수행하기 위한 하나 이상의 프로그램을 저장하는 메모리; 및 상기 메모리에 저장된 상기 하나 이상의 프로그램에 따라 상기 강화 학습을 이용하여 상기 적대적 공격을 상기 보행 인식 시스템에 대해 수행하기 위한 동작을 수행하는 하나 이상의 프로세서;를 포함하며, 상기 프로세서는, 분석 대상인 상기 보행 인식 시스템을 획득하고, 상기 강화 학습(RL)을 이용하여 블랙 박스 공격(black box attack)인 상기 적대적 공격을 상기 보행 인식 시스템에 대해 수행한다.An adversarial attack device using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention to achieve the above technical problem is to strengthen a gait recognition system based on deep learning. A memory that stores one or more programs to perform an adversarial attack using reinforcement learning (RL); and one or more processors that perform an operation to perform the adversarial attack on the gait recognition system using the reinforcement learning according to the one or more programs stored in the memory, wherein the processor is configured to perform the gait recognition system. Obtain a recognition system, and perform the adversarial attack, a black box attack, on the gait recognition system using reinforcement learning (RL).

여기서, 상기 프로세서는, 에이전트(agent)가 상호 작용하는 보행 에너지 이미지(GEI)가 환경(environment)이고, 보행 에너지 이미지(GEI)에서의 픽셀 위치(pixel location)가 상태(state)이며, 상기 에이전트가 현재 상태에서 가능한 모든 이동이 액션(action)이고, 상기 지능형 모델로부터 제공받은 예측 결과를 토대로 결정된 액션 품질 값이 보상(reward)인 상기 강화 학습(RL)을 이용하여, 상기 보행 인식 시스템에 대해 상기 적대적 공격을 수행할 수 있다.Here, the processor is configured so that the Gait Energy Image (GEI) with which the agent interacts is the environment, the pixel location in the Gait Energy Image (GEI) is the state, and the agent All possible movements in the current state are actions, and the action quality value determined based on the prediction result provided from the intelligent model is a reward, using reinforcement learning (RL) for the gait recognition system. The above hostile attack can be performed.

여기서, 상기 프로세서는, Q-러닝(Q-learning)을 기반으로 상기 보행 인식 시스템에 대해 상기 적대적 공격을 수행할 수 있다.Here, the processor may perform the hostile attack on the gait recognition system based on Q-learning.

여기서, 상기 프로세서는, 상기 에이전트가 수행한 상기 액션을 기반으로 미리 설정된 크기의 적대적 패치(adversarial patch)를 생성하고, 상기 적대적 패치가 추가된 적대적 보행 에너지 이미지(GEI)를 상기 지능형 모델에 제공하며, 상기 지능형 모델로부터 제공받은 상기 적대적 보행 에너지 이미지(GEI)에 대한 예측 결과를 토대로 상기 액션에 대한 상기 보상을 결정하는 과정을 통해, 상기 보행 인식 시스템에 대해 상기 적대적 공격을 수행할 수 있다.Here, the processor generates an adversarial patch of a preset size based on the action performed by the agent, and provides an adversarial gait energy image (GEI) to which the adversarial patch is added to the intelligent model; , the hostile attack can be performed on the gait recognition system through a process of determining the reward for the action based on the prediction result for the hostile gait energy image (GEI) provided from the intelligent model.

여기서, 상기 프로세서는, 상기 에이전트가 수행한 상기 액션에 따른 상기 에이전트의 상기 상태인 픽셀 위치를 중심으로 하여 픽셀 값이 랜덤하게 생성된 n×n 크기를 가지는 상기 적대적 패치를 생성하고, 상기 적대적 패치의 픽셀 위치를 토대로 보행 에너지 이미지(GEI)에 상기 적대적 패치를 추가하여 픽셀이 교란(perturbation)된 상기 적대적 보행 에너지 이미지(GEI)를 획득할 수 있다.Here, the processor generates the adversarial patch having a size of n×n in which pixel values are randomly generated centering on the pixel location of the state of the agent according to the action performed by the agent, and the adversarial patch By adding the adversarial patch to the GEI based on the pixel location of , the adversarial GEI in which pixels are perturbed can be obtained.

여기서, 상기 프로세서는, 상기 지능형 모델로부터 제공받은 상기 적대적 보행 에너지 이미지(GEI)에 대한 예측 결과가 상기 지능형 모델의 신뢰도(confidence)를 낮추는 결과에 해당하면 양의 상기 액션 품질 값을 상기 액션에 대한 상기 보상으로 결정하고, 상기 지능형 모델로부터 제공받은 상기 적대적 보행 에너지 이미지(GEI)에 대한 예측 결과가 상기 지능형 모델의 신뢰도를 낮추는 결과에 해당하지 않으면 음의 상기 액션 품질 값을 상기 액션에 대한 상기 보상으로 결정할 수 있다.Here, if the prediction result for the hostile gait energy image (GEI) provided from the intelligent model corresponds to a result that lowers the confidence of the intelligent model, the processor sets a positive action quality value for the action. If the compensation is determined as the reward, and the prediction result for the hostile gait energy image (GEI) provided from the intelligent model does not correspond to a result that lowers the reliability of the intelligent model, the action quality value is negative as the reward for the action. can be decided.

본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 방법, 이를 수행하는 장치 및 컴퓨터 프로그램에 의하면, 딥러닝(deep learning) 기반의 보행 인식 시스템(gait recognition system)에 대해, 강화 학습(reinforcement learning)을 이용하여 적대적 공격(adversarial attack)을 수행함으로써, 보행 인식 시스템의 취약성 등을 분석하여 보행 인식 시스템의 안전성을 보다 정확하게 검증할 수 있다.According to an adversarial attack method using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention, a device and a computer program for performing the same, for a gait recognition system based on deep learning, By performing an adversarial attack using reinforcement learning, the safety of the gait recognition system can be more accurately verified by analyzing the vulnerabilities of the gait recognition system.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 장치를 설명하기 위한 블록도이다.
도 2는 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 방법을 설명하기 위한 흐름도이다.
도 3은 도 2에 도시한 적대적 공격 수행 단계를 설명하기 위한 도면이다.
도 4는 본 발명의 바람직한 실시예에 따른 강화 학습을 제시된 문제에 매핑한 예시를 설명하기 위한 도면이다.
도 5는 본 발명의 바람직한 실시예에 따른 강화 학습을 이용한 적대적 공격 과정을 설명하기 위한 도면이다.
도 6은 본 발명의 바람직한 실시예에 따른 인간의 보행 분석 기반 감시 시스템에 대한 적대적 공격을 설명하기 위한 도면이다.
도 7은 본 발명의 바람직한 실시예에 따른 에이전트와 환경의 상호 작용을 설명하기 위한 도면이다.
도 8은 본 발명의 바람직한 실시예에 따른 Q-러닝을 이용한 적대적 공격의 알고리즘을 설명하기 위한 도면이다.
도 9는 본 발명의 바람직한 실시예에 따른 보행 인식 모델의 결과를 나타내는 표이다.
도 10은 본 발명의 바람직한 실시예에 따른 적대적 공격 하에서의 보행 인식 모델의 결과를 나타내는 표이다.
도 11은 본 발명의 바람직한 실시예에 따른 세 가지 보행 조건이 모두 포함된 프로브 세트의 각 보행 에너지 이미지에 대한 신뢰도 값의 밀도 플롯을 나타내는 도면이다.
도 12는 본 발명의 바람직한 실시예에 따른 적대적 공격 하에서의 스텝의 크기를 달리한 보행 인식 모델의 결과를 나타내는 표이다.
도 13은 본 발명의 바람직한 실시예에 따른 클린 보행 에너지 이미지와 적대적 보행 에너지 이미지의 일례를 나타내는 도면이다.
도 14는 본 발명의 바람직한 실시예에 따른 모든 보행 조건의 공격 성공률을 나타내는 도면이다.
도 15는 본 발명의 바람직한 실시예에 따른 한 개인을 위한 수많은 훈련 에피소드의 결과를 나타내는 표이다.1 is a block diagram illustrating an adversarial attack device using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention.
Figure 2 is a flowchart illustrating an adversarial attack method using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention.
FIG. 3 is a diagram for explaining the steps of performing a hostile attack shown in FIG. 2.
Figure 4 is a diagram illustrating an example of mapping reinforcement learning to a presented problem according to a preferred embodiment of the present invention.
Figure 5 is a diagram illustrating an adversarial attack process using reinforcement learning according to a preferred embodiment of the present invention.
Figure 6 is a diagram illustrating a hostile attack on a human gait analysis-based monitoring system according to a preferred embodiment of the present invention.
Figure 7 is a diagram illustrating the interaction between an agent and the environment according to a preferred embodiment of the present invention.
Figure 8 is a diagram illustrating an algorithm for an adversarial attack using Q-learning according to a preferred embodiment of the present invention.
Figure 9 is a table showing the results of a gait recognition model according to a preferred embodiment of the present invention.
Figure 10 is a table showing the results of a gait recognition model under a hostile attack according to a preferred embodiment of the present invention.
Figure 11 is a diagram showing a density plot of reliability values for each walking energy image of a probe set including all three walking conditions according to a preferred embodiment of the present invention.
Figure 12 is a table showing the results of a gait recognition model with different step sizes under a hostile attack according to a preferred embodiment of the present invention.
Figure 13 is a diagram showing an example of a clean walking energy image and a hostile walking energy image according to a preferred embodiment of the present invention.
Figure 14 is a diagram showing the attack success rate for all walking conditions according to a preferred embodiment of the present invention.
Figure 15 is a table showing the results of a number of training episodes for one individual according to a preferred embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 게시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 게시가 완전하도록 하고, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings. The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various different forms. The present embodiments are merely provided to ensure that the disclosure of the present invention is complete and to provide a general understanding of the technical field to which the present invention pertains. It is provided to fully inform those with knowledge of the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings that can be commonly understood by those skilled in the art to which the present invention pertains. Additionally, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless clearly specifically defined.

본 명세서에서 "제1", "제2" 등의 용어는 하나의 구성 요소를 다른 구성 요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예컨대, 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.In this specification, terms such as “first” and “second” are used to distinguish one component from another component, and the scope of rights should not be limited by these terms. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

본 명세서에서 각 단계들에 있어 식별부호(예컨대, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In this specification, identification codes (e.g., a, b, c, etc.) for each step are used for convenience of explanation. The identification codes do not describe the order of each step, and each step is clearly ordered in a specific order in context. Unless specified, it may occur differently from the specified order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the opposite order.

본 명세서에서, "가진다", "가질 수 있다", "포함한다" 또는 "포함할 수 있다" 등의 표현은 해당 특징(예컨대, 수치, 기능, 동작, 또는 부품 등의 구성 요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this specification, expressions such as “have,” “may have,” “includes,” or “may include” indicate the presence of the corresponding feature (e.g., a numerical value, function, operation, or component such as a part). indicates, does not rule out the presence of additional features.

이하에서 첨부한 도면을 참조하여 본 발명에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 방법, 이를 수행하는 장치 및 컴퓨터 프로그램의 바람직한 실시예에 대해 상세하게 설명한다.Hereinafter, with reference to the attached drawings, a preferred embodiment of an adversarial attack method using reinforcement learning for a gait recognition system according to the present invention, a device for performing the same, and a computer program will be described in detail.

먼저, 도 1을 참조하여 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 장치에 대하여 설명한다.First, with reference to FIG. 1, an adversarial attack device using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention will be described.

도 1은 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 장치를 설명하기 위한 블록도이다.1 is a block diagram illustrating an adversarial attack device using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention.

도 1을 참조하면, 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 장치(이하 '적대적 공격 장치'라 한다)(100)는 딥러닝(deep learning, DL) 기반의 보행 인식 시스템(gait recognition system)에 대해, 강화 학습(reinforcement learning, RL)을 이용하여 적대적 공격(adversarial attack)을 수행할 수 있다.Referring to FIG. 1, an adversarial attack device 100 using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention (hereinafter referred to as 'adversarial attack device') is a deep learning (DL)-based device. An adversarial attack can be performed on a gait recognition system using reinforcement learning (RL).

이에 따라, 본 발명은 보행 인식 시스템의 취약성 등을 분석하여 보행 인식 시스템의 안전성을 보다 정확하게 검증할 수 있다.Accordingly, the present invention can more accurately verify the safety of the gait recognition system by analyzing the vulnerabilities of the gait recognition system.

이를 위해, 적대적 공격 장치(100)는 하나 이상의 프로세서(110), 컴퓨터 판독 가능한 저장 매체(130) 및 통신 버스(150)를 포함할 수 있다.To this end, the hostile attack device 100 may include one or more processors 110, a computer-readable storage medium 130, and a communication bus 150.

프로세서(110)는 적대적 공격 장치(100)가 동작하도록 제어할 수 있다. 예컨대, 프로세서(110)는 컴퓨터 판독 가능한 저장 매체(130)에 저장된 하나 이상의 프로그램(131)을 실행할 수 있다. 하나 이상의 프로그램(131)은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 컴퓨터 실행 가능 명령어는 프로세서(110)에 의해 실행되는 경우 적대적 공격 장치(100)로 하여금 보행 인식 시스템의 안전성을 검증하기 위해, 강화 학습(RL)을 이용하여 적대적 공격을 보행 인식 시스템에 대해 수행하기 위한 동작을 수행하도록 구성될 수 있다.The processor 110 may control the operation of the hostile attack device 100. For example, the processor 110 may execute one or more programs 131 stored in the computer-readable storage medium 130. One or more programs 131 may include one or more computer-executable instructions, which, when executed by the processor 110, allow the hostile attack device 100 to verify the safety of the gait recognition system. , It may be configured to perform an operation to perform an adversarial attack on a gait recognition system using reinforcement learning (RL).

컴퓨터 판독 가능한 저장 매체(130)는 강화 학습(RL)을 이용하여 적대적 공격을 보행 인식 시스템에 대해 수행하기 위한 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능한 저장 매체(130)에 저장된 프로그램(131)은 프로세서(110)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능한 저장 매체(130)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 적대적 공격 장치(100)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.The computer-readable storage medium 130 is configured to store computer-executable instructions, program code, program data, and/or other suitable form of information for performing an adversarial attack on a gait recognition system using reinforcement learning (RL). do. The program 131 stored in the computer-readable storage medium 130 includes a set of instructions executable by the processor 110. In one embodiment, computer-readable storage medium 130 includes memory (volatile memory, such as random access memory, non-volatile memory, or an appropriate combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, other types of storage media that can be accessed by the hostile attack device 100 and store desired information, or a suitable combination thereof.

통신 버스(150)는 프로세서(110), 컴퓨터 판독 가능한 저장 매체(130)를 포함하여 적대적 공격 장치(100)의 다른 다양한 컴포넌트들을 상호 연결한다.Communication bus 150 interconnects various other components of adversary attack device 100, including processor 110 and computer-readable storage medium 130.

적대적 공격 장치(100)는 또한 하나 이상의 입출력 장치를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(170) 및 하나 이상의 통신 인터페이스(190)를 포함할 수 있다. 입출력 인터페이스(170) 및 통신 인터페이스(190)는 통신 버스(150)에 연결된다. 입출력 장치(도시하지 않음)는 입출력 인터페이스(170)를 통해 적대적 공격 장치(100)의 다른 컴포넌트들에 연결될 수 있다.Hostile attack device 100 may also include one or more input/output interfaces 170 and one or more communication interfaces 190 that provide an interface for one or more input/output devices. The input/output interface 170 and communication interface 190 are connected to the communication bus 150. An input/output device (not shown) may be connected to other components of the hostile attack device 100 through the input/output interface 170.

그러면, 도 2 및 도 3을 참조하여 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 방법에 대하여 설명한다.Next, an adversarial attack method using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention will be described with reference to FIGS. 2 and 3.

도 2는 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 방법을 설명하기 위한 흐름도이고, 도 3은 도 2에 도시한 적대적 공격 수행 단계를 설명하기 위한 도면이다.Figure 2 is a flowchart for explaining an adversarial attack method using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention, and Figure 3 is a diagram for explaining the steps of performing an adversarial attack shown in Figure 2.

도 2를 참조하면, 적대적 공격 장치(100)의 프로세서(110)는 분석 대상인 딥러닝(DL) 기반의 보행 인식 시스템을 획득할 수 있다(S110).Referring to FIG. 2, the processor 110 of the hostile attack device 100 may acquire a deep learning (DL)-based gait recognition system that is an analysis target (S110).

여기서, 보행 인식 시스템은 보행 에너지 이미지(gait energy image, GEI) 기반의 보행 표현(gait representation)을 이용하여 보행 스타일로 개인을 인식하는 심층 합성곱 신경망(deep convolutional neural network, DCNN) 기반의 지능형 모델(intelligent model)일 수 있다. 즉, 보행 인식 시스템은 카메라를 통해 촬영된 영상을 토대로 보행 표현을 획득하고, 획득한 보행 표현을 토대로 대응되는 사람을 식별할 수 있다. 물론, 보행 인식 시스템은 보행 표현을 이용하여 개인을 인식하는 동작에서 더 나아가 침입자를 감시할 수 있는 동작을 수행하는 보행 분석 기반 감시 시스템(surveillance system based on human gait analysis)일 수도 있다.Here, the gait recognition system is an intelligent model based on a deep convolutional neural network (DCNN) that recognizes individuals by walking style using a gait representation based on a gait energy image (GEI). It may be (intelligent model). In other words, the gait recognition system can obtain a gait expression based on images captured through a camera and identify the corresponding person based on the obtained gait expression. Of course, the gait recognition system may be a surveillance system based on human gait analysis that goes beyond the operation of recognizing individuals using gait expressions and performs operations that can monitor intruders.

그런 다음, 프로세서(110)는 강화 학습(RL)을 이용하여 블랙 박스 공격(black box attack)인 적대적 공격을 보행 인식 시스템에 대해 수행할 수 있다(S120).Then, the processor 110 may perform an adversarial attack, called a black box attack, on the gait recognition system using reinforcement learning (RL) (S120).

여기서, 블랙 박스 공격은 공격 대상인 보행 인식 시스템의 기본 모델(underlying model)의 파라미터(parameter), 기울기(gradient) 및 구조(structure)에 대한 정보를 알지 못하는 상태에서, 보행 인식 시스템에 대해 공격하는 것을 말한다.Here, a black box attack refers to an attack on a gait recognition system without knowing the parameters, gradient, and structure of the underlying model of the gait recognition system that is the target of the attack. says

즉, 프로세서(110)는 에이전트(agent)가 상호 작용하는 보행 에너지 이미지(GEI)가 환경(environment)이고, 보행 에너지 이미지(GEI)에서의 픽셀 위치(pixel location)가 상태(state)이며, 에이전트가 현재 상태에서 가능한 모든 이동이 액션(action)이고, 지능형 모델로부터 제공받은 예측 결과를 토대로 결정된 액션 품질 값이 보상(reward)인 강화 학습(RL)을 이용하여, 보행 인식 시스템에 대해 적대적 공격을 수행할 수 있다.That is, the processor 110 determines that the GEI with which the agent interacts is the environment, the pixel location in the GEI is the state, and the agent interacts with the GEI. An adversarial attack is carried out against the gait recognition system using reinforcement learning (RL), where all possible movements in the current state are actions and the action quality value determined based on the prediction results provided from the intelligent model is the reward. It can be done.

여기서, 액션은 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 위쪽에 있는 픽셀로 이동, 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 아래쪽에 있는 픽셀로 이동, 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 왼쪽에 있는 픽셀로 이동, 및 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 오른쪽에 있는 픽셀로 이동을 포함할 수 있다. 이때, 액션은 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 미리 설정된 크기의 스텝(step)을 단위로 다른 픽셀로 이동할 수 있다. 예컨대, 스텝의 크기가 '1'이면, 액션은 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 1 스텝(즉, 1개의 픽셀)만큼 다른 픽셀로 이동할 수 있다. 스텝의 크기가 '3'이면, 액션은 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 3 스텝(즉, 3개의 픽셀)만큼 다른 픽셀로 이동할 수 있다. 스텝의 크기가 '5'이면, 액션은 에이전트가 현재 상태에 따른 픽셀 위치를 기준으로 5 스텝(즉, 5개의 픽셀)만큼 다른 픽셀로 이동할 수 있다.Here, the action is: the agent moves to the pixel above based on the pixel location according to the current state, the agent moves to the pixel below based on the pixel location according to the current state, and the agent moves to the pixel below based on the pixel location according to the current state. This can include moving to the pixel on the left, and moving to the pixel on the right based on the pixel location according to the agent's current state. At this time, the action allows the agent to move to another pixel in steps of a preset size based on the pixel location according to the current state. For example, if the step size is '1', the action can move the agent to another pixel by 1 step (i.e., 1 pixel) based on the pixel location according to the current state. If the step size is '3', the action can move the agent to another pixel by 3 steps (i.e., 3 pixels) based on the pixel location according to the current state. If the step size is '5', the action allows the agent to move to another pixel by 5 steps (i.e., 5 pixels) based on the pixel location according to the current state.

이때, 프로세서(110)는 Q-러닝(Q-learning)을 기반으로 보행 인식 시스템에 대해 적대적 공격을 수행할 수 있다. Q-러닝에 대한 내용은 이하에서 자세히 설명한다.At this time, the processor 110 may perform a hostile attack on the gait recognition system based on Q-learning. Q-learning is explained in detail below.

보다 자세하게 설명하면, 프로세서(110)는 에이전트가 수행한 액션을 기반으로 미리 설정된 크기의 적대적 패치(adversarial patch)를 생성할 수 있다. 즉, 프로세서(110)는 에이전트가 수행한 액션에 따른 에이전트의 상태인 픽셀 위치를 중심으로 하여 픽셀 값이 랜덤하게 생성된 n×n 크기(예컨대, 3×3 등)를 가지는 적대적 패치를 생성할 수 있다.In more detail, the processor 110 may generate an adversarial patch of a preset size based on the action performed by the agent. In other words, the processor 110 creates an adversarial patch with n × n size (e.g., 3 × 3, etc.) in which pixel values are randomly generated centering on the pixel position, which is the state of the agent according to the action performed by the agent. You can.

그리고, 프로세서(110)는 적대적 패치가 추가된 적대적 보행 에너지 이미지(GEI)를 지능형 모델에 제공할 수 있다. 즉, 프로세서(110)는 적대적 패치의 픽셀 위치를 토대로 보행 에너지 이미지(GEI)에 적대적 패치를 추가하여 픽셀이 교란(perturbation)된 적대적 보행 에너지 이미지(GEI)를 획득할 수 있다.Additionally, the processor 110 may provide an adversarial gait energy image (GEI) to which adversarial patches are added to the intelligent model. That is, the processor 110 may obtain a GEI in which pixels are perturbed by adding the hostile patch to the GEI based on the pixel location of the hostile patch.

그리고, 프로세서(110)는 지능형 모델로부터 제공받은 적대적 보행 에너지 이미지(GEI)에 대한 예측 결과를 토대로 액션에 대한 보상을 결정할 수 있다. 즉, 프로세서(110)는 지능형 모델로부터 제공받은 적대적 보행 에너지 이미지(GEI)에 대한 예측 결과가 지능형 모델의 신뢰도(confidence)를 낮추는 결과에 해당하면 양의 액션 품질 값(예컨대, +10 등)을 액션에 대한 보상으로 결정하고, 지능형 모델로부터 제공받은 적대적 보행 에너지 이미지(GEI)에 대한 예측 결과가 지능형 모델의 신뢰도를 낮추는 결과에 해당하지 않으면 음의 액션 품질 값(예컨대, -10 등)을 액션에 대한 보상으로 결정할 수 있다.Additionally, the processor 110 may determine compensation for the action based on the prediction result of the hostile gait energy image (GEI) provided from the intelligent model. That is, the processor 110 sets a positive action quality value (e.g., +10, etc.) if the prediction result for the hostile gait energy image (GEI) provided from the intelligent model corresponds to a result that lowers the confidence of the intelligent model. It is determined as a reward for the action, and if the prediction result for the hostile Gait Energy Image (GEI) provided by the intelligent model does not correspond to a result that lowers the reliability of the intelligent model, a negative action quality value (e.g., -10, etc.) is applied to the action. It can be decided as compensation for.

이와 같은, 적대적 패치 생성 과정, 적대적 보행 에너지 이미지(GEI) 제공 과정, 및 보상 결정 과정을 통해, 프로세서(110)는 보행 인식 시스템에 대해 적대적 공격을 수행할 수 있다.Through this adversarial patch generation process, adversarial gait energy image (GEI) provision process, and compensation determination process, the processor 110 can perform an adversarial attack on the gait recognition system.

그러면, 도 4 내지 도 15를 참조하여 본 발명의 바람직한 실시예에 따른 보행 인식 시스템에 대한 강화 학습을 이용한 적대적 공격 과정에 대하여 보다 자세하게 설명한다.Then, with reference to FIGS. 4 to 15 , the adversarial attack process using reinforcement learning for a gait recognition system according to a preferred embodiment of the present invention will be described in more detail.

도 4는 본 발명의 바람직한 실시예에 따른 강화 학습을 제시된 문제에 매핑한 예시를 설명하기 위한 도면이다.Figure 4 is a diagram illustrating an example of mapping reinforcement learning to a presented problem according to a preferred embodiment of the present invention.

본 발명에서는 블랙 박스 공격 시 '보행 인식 시스템'의 취약점을 악용해 보았다. 걸음걸이는 개별적인 보행 방식을 말하며 인간의 고유한 특성으로 간주된다. 사람의 지문, 서명, 얼굴, 목소리와 마찬가지로 걸음걸이는 한 대상을 다른 대상과 고유하게 구별하는 또 다른 생체 인식 기능이다. 이러한 생체 인식에 비해 인간 보행의 주요 장점은 보행 획득에 피험자의 개입이 필요하지 않으며 장거리 및 저해상도 카메라에서 획득할 수 있다는 것이다. 따라서, 인간의 보행은 고유한 보행 속성을 통해 인간 인증이 수행되는 보안/감시 시스템을 위한 실행 가능하고 진화하는 기술이다.In the present invention, vulnerabilities in the 'gait recognition system' were exploited during black box attacks. Gait refers to an individual way of walking and is considered a unique characteristic of humans. Like a person's fingerprint, signature, face, or voice, gait is another biometric feature that uniquely distinguishes one subject from another. The main advantage of human gait over these biometrics is that gait acquisition does not require subject intervention and can be acquired from long-distance and low-resolution cameras. Therefore, human gait is a viable and evolving technology for security/surveillance systems where human authentication is performed through unique gait properties.

최근, 보행과 관련된 비디오 감시 시스템은 딥러닝(DL) 기반 전략을 광범위하게 채택하고 뛰어난 성능을 달성했다. 이러한 연구는 합성곱 신경망(CNN) 아키텍처를 기반으로 한 지능형 자율 모델이 효과적인 의사결정 능력과 가장 높은 인식률을 가지고 있음을 관찰했다. 따라서, 특히 공격자가 시스템에 액세스할 수 없는 경우, 딥러닝(DL) 접근 방식을 통해 보행 인식 프레임워크 설계의 기능을 결정하는 것이 필수적이다. 이 결정은 주요 결함, 특히 이러한 공격에 대한 시스템의 적대적 견고성을 적절하게 강조한다. 이러한 지능형 보행 기반 자율 시스템의 약점은 숨겨져 있다. 이러한 전략에서, 보안/감시 시스템에서 보행 인식을 수행하기 위한 가장 일반적인 보행 표현 중 하나는 보행 에너지 이미지(GEI)이다. 그러나, 적대적 공격 하에서 이러한 알고리즘의 민감성을 묘사해야 한다. 문헌에서, 이 민감성은 실루엣 기반 표현(silhouette-based representation)으로 연구된다. 그러나, 연구에 따르면 보행 에너지 이미지(GEI) 기반 보행 표현은 실루엣 기반 표현보다 잡음이 적다. 결과적으로, 중요한 연구 과제는 가장 효과적인 보행 표현을 사용하여 모델 취약성을 악용하는 방법이다. 둘째, 보행 인식 시스템이 철저한 감시 하에 있을 때 공격자는 모델에 액세스할 수 없는 경우가 많다. 이 연구의 연장선에서 에이전트가 전체 보상을 늘리기 위해 환경에서 최상의 작업을 수행하려고 시도하는 머신 러닝(ML)의 또 다른 분기인 강화 학습(RL)을 사용하는 블랙 박스의 또 다른 변형을 제안한다. 본 발명에서 제안한 강화 학습(RL) 기반 블랙 박스 공격에서, 공격자는 모델 기울기, 파라미터, 구조에 대한 정보와 같은 환경에 대해 알지 못한다. 또한, 모든 클린(clean) 이미지의 경우 강화 학습(RL) 에이전트는 이미지의 최적 위치를 계산하고 결정하기 위해 환경에 관여하려고 시도한다. 여기서, 픽셀의 교란으로 인해, 기본 모델이 클린 이미지를 잘못 분류하게 된다. 환경과 상호 작용하는 것이 환경에 대한 정보를 수집하는 유일한 방법이다. 강화 학습(RL)을 사용한 적대적 공격의 샘플 매핑은 도 4에 도시된 그림에 묘사되어 있다. 본 발명을 통해 제안된 블랙 박스 공격은 기본 시스템의 보안을 위반하는 중요한 결과를 보여준다. 모델이 올바른 클래스를 예측하지 못하도록, 에이전트가 임의로 생성된 교란의 최적 위치를 결정하려고 시도하는 강화 학습(RL)의 기본 알고리즘 중 하나인 Q-러닝으로 문제를 공식화한다.Recently, walking-related video surveillance systems have widely adopted deep learning (DL)-based strategies and achieved excellent performance. These studies observed that intelligent autonomous models based on convolutional neural network (CNN) architecture have effective decision-making capabilities and the highest recognition rates. Therefore, it is essential to determine the functionality of the gait recognition framework design through a deep learning (DL) approach, especially when the attacker does not have access to the system. This decision appropriately highlights key flaws, especially the adversarial robustness of the system against these attacks. The weaknesses of these intelligent walking-based autonomous systems are hidden. In these strategies, one of the most common gait representations for performing gait recognition in security/surveillance systems is the Gait Energy Image (GEI). However, the sensitivity of these algorithms under adversarial attacks needs to be described. In the literature, this sensitivity is studied with silhouette-based representation. However, studies have shown that Gait Energy Image (GEI)-based gait representations are less noisy than silhouette-based representations. As a result, an important research question is how to exploit model vulnerabilities using the most effective gait representation. Second, when a gait recognition system is under close surveillance, attackers often do not have access to the model. As an extension of this work, we propose another variant of the black box using reinforcement learning (RL), another branch of machine learning (ML) in which the agent attempts to perform the best action in the environment to increase the overall reward. In the reinforcement learning (RL)-based black box attack proposed in the present invention, the attacker does not know about the environment, such as information about model gradients, parameters, and structure. Additionally, for every clean image, a reinforcement learning (RL) agent attempts to engage with the environment to calculate and determine the optimal location of the image. Here, pixel disturbance causes the base model to misclassify the clean image. Interacting with the environment is the only way to gather information about the environment. A sample mapping of an adversarial attack using reinforcement learning (RL) is depicted in the figure shown in Figure 4. The black box attack proposed through this invention shows important results in violating the security of the basic system. We formulate the problem in Q-learning, one of the basic algorithms of reinforcement learning (RL), in which the agent attempts to determine the optimal location of randomly generated perturbations to prevent the model from predicting the correct class.

본 발명의 기여는 다음과 같다.The contributions of the present invention are as follows.

- 인간의 보행을 이용하여 설계된 자율 영상 감시 시스템의 의사결정 능력을 검증하고 조사하기 위해 강화 학습(RL)을 이용한 효율적인 블랙 박스 적대적 공격을 제안한다.- We propose an efficient black box adversarial attack using reinforcement learning (RL) to verify and investigate the decision-making ability of an autonomous video surveillance system designed using human gait.

- 제안된 적대적 공격은 적을 만드는 동안 기본 딥러닝(DL) 모델 구조 및 파라미터를 포함하지 않는다.- The proposed adversarial attack does not include basic deep learning (DL) model structure and parameters while creating the adversary.

- 적대적 이미지는 보다 간결한 보행 표현, 특히 보행 에너지 이미지(GEI)를 사용하여 생성되었으며 추가된 적대적 노이즈는 이미지에서 덜 분명하다.- Adversarial images were generated using more concise gait representations, especially gait energy images (GEI), and the added adversarial noise is less evident in the images.

본 발명에서 제안된 강화 학습 기반 적대적 공격Reinforcement learning-based adversarial attack proposed in the present invention

도 5는 본 발명의 바람직한 실시예에 따른 강화 학습을 이용한 적대적 공격 과정을 설명하기 위한 도면이다.Figure 5 is a diagram illustrating an adversarial attack process using reinforcement learning according to a preferred embodiment of the present invention.

이 섹션에서는 본 발명을 통해 제안된 강화 학습(RL) 기반의 보행 인식 시스템에 대한 적대적 공격에 대해 설명한다. 이 알고리즘의 핵심 동기와 개념은 공격자가 모델 구조에 대한 액세스 권한이 없을 때, 자율 보행 기반 비디오 감시 시스템에 적대적 공격을 수행하는 것이다. 알고리즘은 강화 학습(RL)을 사용하여 기본 모델에 참여하여 적대적 샘플을 만든다. 그러나, 적대적 공격을 시작하기 전에 먼저 딥러닝(DL)을 사용하여 정확한 보행 인식 시스템을 확보해야 한다. 이를 위해, 딥러닝(DL) 기반 보행 인식 시스템을 개시하고 이에 대한 적대적 공격을 수행한다. 다음 섹션에서는 단계별 절차를 설명한다. 제안된 작업의 그림 표현은 도 5에 도시된 그림에 묘사되어 있다.This section describes an adversarial attack against the reinforcement learning (RL)-based gait recognition system proposed through the present invention. The core motivation and concept of this algorithm is to perform adversarial attacks on autonomous walking-based video surveillance systems when the attacker does not have access to the model structure. The algorithm uses reinforcement learning (RL) to create adversarial samples by joining the base model. However, before launching an adversarial attack, an accurate gait recognition system must first be secured using deep learning (DL). To this end, we launch a deep learning (DL)-based gait recognition system and perform an adversarial attack on it. The following sections describe the step-by-step procedure. A pictorial representation of the proposed work is depicted in the figure shown in Figure 5.

1. 비디오 기반 보행 데이터1. Video-based gait data

사람 식별을 수행하기 위해 보행 데이터를 수집하는 방법에는 여러 가지가 있다. 본 발명은 비디오 기반 보행 데이터를 이용하여 보행 인식을 수행한다. 따라서, 첫 번째 단계에서는 중국과학원(CASIA) 데이터베이스에서 얻은 다양한 개인의 보행 데이터를 수집했다. 이 데이터베이스는 CASIA A, B 및 C의 세 가지 주요 부분으로 구성된다. 본 발명은 실험 목적으로 가장 큰 멀티뷰 보행 데이터베이스인 CASIA-B 보행 데이터베이스를 사용했다. 이 부분은 서로 다른 카메라 각도에서 다양한 개인의 비디오 시퀀스를 포함한다. 보다 정확하게는, 각 개인은 세 가지 유형의 비디오 시퀀스 10개를 가지고 있다. 일반적인 걷기, 가방을 들고 걸을 때, 다양한 코트를 입는 것이다. 데이터베이스에는 124명의 피험자 데이터가 포함되어 있으며, 각 피험자는 11개의 카메라 뷰에 대한 10개의 걷기 시퀀스를 가지고 있다.There are several ways to collect gait data to perform person identification. The present invention performs gait recognition using video-based gait data. Therefore, in the first step, we collected gait data of various individuals obtained from the Chinese Academy of Sciences (CASIA) database. This database consists of three main parts: CASIA A, B and C. The present invention used the CASIA-B gait database, the largest multi-view gait database, for experimental purposes. This part includes video sequences of various individuals from different camera angles. More precisely, each person has 10 video sequences of three types. Normal walking, carrying a bag while walking, and wearing various coats. The database contains data from 124 subjects, each subject having 10 walking sequences for 11 camera views.

2. 보행 데이터 표현2. Gait data representation

지능형 딥러닝(DL) 모델을 훈련시키기 전에 카메라로 촬영한 감시 영상에서 수집한 보행 데이터를 특정 보행 표현으로 전처리한다. 보행에 대한 기존 문헌의 연구자들에 의해 많은 표현이 제안되었다. 그러나, 일부 연구에서는 서로 다른 피사체의 비디오 시퀀스 실루엣 프레임(silhouette frames)을 직접 사람 식별에 사용한다.Before training an intelligent deep learning (DL) model, gait data collected from surveillance video captured by a camera is preprocessed into a specific gait expression. Many representations have been proposed by researchers in the existing literature on gait. However, in some studies, silhouette frames of video sequences of different subjects are used directly for human identification.

각 비디오 시퀀스는 많은 프레임으로 구성되며, 각 프레임의 처리로 인해 알고리즘의 계산 복잡성이 증가한다. 따라서, 이 문제를 극복하기 위해, 본 발명에서는 가장 널리 사용되는 보행 표현 중 하나인 보행 에너지 이미지(GEI)를 사용하며, 이는 [수학식 1]을 사용하여 계산할 수 있다.Each video sequence consists of many frames, and the processing of each frame increases the computational complexity of the algorithm. Therefore, to overcome this problem, the present invention uses gait energy image (GEI), one of the most widely used gait expressions, which can be calculated using [Equation 1].

[수학식 1]에서 기호 x와 y는 픽셀 값의 위치를 나타내고, t는 프레임 번호를 나타내며, t=1부터 T까지이다. 결과 이미지는 GEI(x,y)로 표현되는 보행 에너지 이미지(GEI)라고 한다. 여기서, x와 y는 이미지의 픽셀 위치를 나타낸다. 보행 에너지 이미지(GEI)는 특정 대상(subject)에 대한 모든 비디오 시퀀스 프레임의 평균이다. 모든 대상에 대한 보행 에너지 이미지(GEI)를 컴파일한 후 보행 에너지 이미지(GEI)의 크기를 미리 정의된 240×240×1 크기로 조정한다.In [Equation 1], the symbols x and y represent the positions of pixel values, and t represents the frame number, from t=1 to T. The resulting image is called a Gait Energy Image (GEI), expressed as GEI(x,y). Here, x and y represent the pixel positions of the image. Gait Energy Image (GEI) is the average of all video sequence frames for a specific subject. After compiling the gait energy image (GEI) for all subjects, the size of the gait energy image (GEI) is adjusted to a predefined size of 240 × 240 × 1.

3. 심층 합성곱 신경망(deep convolutional neural network, DCNN) 기반 지능형 모델3. Intelligent model based on deep convolutional neural network (DCNN)

보행 인식 시스템의 전처리 단계를 거친 후, 보행 에너지 이미지(GEI) 기반의 보행 표현을 사용하여 개인을 보행 스타일로 인식하는 심층 합성곱 신경망(DCNN)을 설계하여 end-to-end 자율 솔루션을 제공한다. 본 발명에서는 Bukhari et al.이 제안[M. Bukhari, K. B. Bajwa, S. Gillani, M. Maqsood, M. Y. Durrani, I. Mehmood, et al., "An efficient gait recognition method for known and unknown covariate conditions," IEEE Access, vol. 9, pp. 6465-6477, 2020.]한 것과 동일한 DCNN 아키텍처를 계속 사용한다. 이 구조는 필터 크기가 3×3인 4개의 컨볼루션 레이어(convolutional layer)를 가지고 있으며, 모든 컨볼루션 레이어 이후에 리키 정류된 선형 유닛 활성화 함수(leaky rectified linear unit activation function)가 있다. 다음으로, 이전 레이어에서 생성된 출력 특징 맵(feature map)의 크기를 축소하기 위해, 모든 컨볼루션 레이어 뒤에 창 크기가 2×2인 최대 풀링 레이어(max-pooling layer)를 삽입했다. 모든 컨볼루션 레이어의 총 필터 수는 16, 32, 64, 124개이다. 그런 다음, 평면화된 레이어(flattened layer)를 추가하여 마지막 최대 풀링 레이어의 출력을 1차원으로 변환하여 출력이 덴스 레이어(dense layer)로 전달되도록 했다. 마지막으로, 합성곱 레이어(convolutional layer)와 최대 풀링 레이어에서 추출된 특징을 기반으로 분류 과정을 수행하기 위해, 두 개의 덴스 레이어를 추가했다. 마지막 덴스 레이어의 히든 유닛(hidden unit) 수는 보행 데이터베이스에서 사용할 수 있는 대상 수와 같다. 또한, 다른 모델 하이퍼파라미터에는 가중치 초기화 및 최적화 방법이 포함되며, 각각 Xavier 및 Adam이다. 학습률(learning rate)은 0.001, 배치 크기(batch size)는 4, 총 에포크(epoch) 수는 30으로 설정했다. 손실 함수(loss function)는 다중 클래스 분류 문제이므로 '범주형 교차 엔트로피(categorical cross entropy)'이다.After going through the preprocessing stage of the gait recognition system, we design a deep convolutional neural network (DCNN) that recognizes individuals by gait style using gait expression based on gait energy image (GEI) to provide an end-to-end autonomous solution. . In the present invention, Bukhari et al. proposed [M. Bukhari, K. B. Bajwa, S. Gillani, M. Maqsood, M. Y. Durrani, I. Mehmood, et al., “An efficient gait recognition method for known and unknown covariate conditions,” IEEE Access, vol. 9, pp. 6465-6477, 2020.] continues to use the same DCNN architecture. This structure has four convolutional layers with a filter size of 3×3, and there is a leaky rectified linear unit activation function after every convolutional layer. Next, to reduce the size of the output feature map generated in the previous layer, a max-pooling layer with a window size of 2×2 was inserted after all convolutional layers. The total number of filters in all convolutional layers is 16, 32, 64, and 124. Then, a flattened layer was added to convert the output of the last max pooling layer to one dimension and pass the output to the dense layer. Finally, two dense layers were added to perform a classification process based on the features extracted from the convolutional layer and the max pooling layer. The number of hidden units in the last dense layer is equal to the number of objects available in the gait database. Additionally, other model hyperparameters include weight initialization and optimization methods, which are Xavier and Adam, respectively. The learning rate was set to 0.001, the batch size was set to 4, and the total number of epochs was set to 30. The loss function is 'categorical cross entropy' since it is a multi-class classification problem.

4. 강화 학습을 이용한 적대적 공격4. Adversarial attack using reinforcement learning

도 6은 본 발명의 바람직한 실시예에 따른 인간의 보행 분석 기반 감시 시스템에 대한 적대적 공격을 설명하기 위한 도면이다.Figure 6 is a diagram illustrating a hostile attack on a human gait analysis-based monitoring system according to a preferred embodiment of the present invention.

강화 학습(RL)은 환경과 상호 작용하는 동안 보상을 높이기 위해 특정 시나리오에서 행동하는 방법을 배우는 지능형 에이전트가 문제와 해결 방법을 연구하는 머신 러닝(ML) 분야이다. 강화 학습(RL) 문제는 에이전트나 학습자(learner)가 어떤 액션을 취해야 하는지 안내할 강사(instructor)가 없는 문제에 매우 유용할 수 있다. 따라서, 에이전트는 시행 착오 방법을 통해 방향을 찾아야 한다.Reinforcement learning (RL) is a field of machine learning (ML) in which problems and solutions are studied by intelligent agents that learn how to behave in specific scenarios to increase rewards while interacting with the environment. Reinforcement learning (RL) problems can be very useful for problems where there is no instructor to guide the agent or learner on what action to take. Therefore, the agent must find direction through trial and error methods.

또한. 강화 학습(RL)은 목표 지향적 학습과 최적의 의사 결정을 파악하고 실행하기 위한 접근 방식이다. 에이전트가 환경과의 직접적인 상호 작용을 통해 학습하는 것을 강조하는 다른 계산 기술과 구별된다. 강화 학습(RL) 알고리즘의 일부 유형은 모델 기반이고, 일부는 모델이 없다. 강화 학습(RL) 알고리즘의 가장 간단한 종류 중 하나는 현재 상태에 따라 최적의 액션을 결정하도록 학습되는 강화 학습(RL)의 오프 정책 알고리즘(off-policy algorithm)인 Q-러닝이다. 보상의 가치가 높아지는 정책만 학습한다. 적대적 공격의 그림 표현은 도 6에 도시된 그림에 나와 있다.also. Reinforcement learning (RL) is an approach to identify and implement goal-directed learning and optimal decision making. It is distinct from other computational techniques that emphasize that the agent learns through direct interaction with the environment. Some types of reinforcement learning (RL) algorithms are model-based, and some are model-free. One of the simplest types of reinforcement learning (RL) algorithms is Q-learning, an off-policy algorithm of reinforcement learning (RL) that is learned to determine the optimal action based on the current state. Only policies that increase the value of rewards are learned. A pictorial representation of an adversarial attack is shown in the figure shown in Figure 6.

보다 구체적으로, 적대적 공격의 문제는 다양한 강화 학습(RL) 구성 요소 또는 단계(예컨대, 환경, 상태 공간(state-space), 액션, 목표(goal), 상태 전환(state transition) 등)를 공식화하여 강화 학습(RL) 패러다임에서 표현된다. 본 발명을 통해 제안된 알고리즘의 환경은 에이전트가 상호 작용하는 보행 에너지 이미지(GEI)이다. 상태는 에이전트가 행동하는 픽셀 위치이다. 액션 후 보상이 지정된다. 에이전트에게 수여되는 보상의 양은 이미지의 실제 레이블(actual label)에서 모델의 신뢰도를 감소시키는 에이전트의 성능에 따라 결정된다. 도 6에 도시된 그림에서 볼 수 있듯이, 강화 학습(RL) 에이전트는 랜덤 픽셀의 적대적 패치로 클린 보행 에너지 이미지(GEI)를 교란시킨다. 결과, 적대적 보행 에너지 이미지(GEI)는 강화 학습(DL) 기반의 기본 감시 시스템에 대한 쿼리로 전달된다. 모델은 쿼리 결과로 잘못된 대상 ID를 예측한다. 단계별 공식은 다음과 같다.More specifically, the problem of adversarial attacks can be solved by formulating various reinforcement learning (RL) components or steps (e.g., environment, state-space, action, goal, state transition, etc.). It is expressed in the reinforcement learning (RL) paradigm. The environment of the algorithm proposed through the present invention is the Gait Energy Image (GEI) with which the agent interacts. A state is the pixel location at which the agent acts. Rewards are assigned after the action. The amount of reward awarded to an agent is determined by the agent's performance in reducing the model's confidence in the actual label of the image. As can be seen in the figure shown in Figure 6, a reinforcement learning (RL) agent perturbs a clean Gait Energy Image (GEI) with adversarial patches of random pixels. The resulting adversarial gait energy image (GEI) is passed as a query to a basic surveillance system based on reinforcement learning (DL). The model predicts an incorrect target ID as a result of the query. The step-by-step formula is as follows:

4-1. 적대적 패치4-1. hostile patch

보행 기반 자동 감시 시스템에 대해 제안된 적대적 공격은 네트워크 구조 파라미터에 액세스하지 않고 수행된다. 이 공격의 주요 목적은 대상 인증(subject authentication) 하에서 심층 합성곱 신경망(DCNN) 모델의 신뢰성을 검증하는 것이다. 입력이 240×240×1 보행 에너지 이미지(GEI)이고 출력이 보행 에너지 이미지(GEI)의 올바른 클래스 레이블(class label) t인 기본 대상 모델 f를 고려한다. 본 발명은 v=(x,y)를 0에서 255 사이의 임의의 픽셀 값을 가진 크기 n×n의 적대적 패치로 둔다. 이 패치가 강화 학습(RL) 에이전트에 의해 결정된 최적의 위치에서 원래 보행 에너지 이미지(GEI)에 추가되면, 모델 출력은 [수학식 2]와 같이, 올바른 레이블과 유사하지 않게 된다.The proposed adversarial attack on the gait-based automatic surveillance system is performed without access to network structural parameters. The main goal of this attack is to verify the trustworthiness of a deep convolutional neural network (DCNN) model under subject authentication. Consider a basic target model f whose input is a 240×240×1 GEI and whose output is the correct class label t of the GEI. The present invention lets v=(x,y) be an adversarial patch of size n×n with random pixel values between 0 and 255. When this patch is added to the original Gait Energy Image (GEI) at the optimal location determined by a reinforcement learning (RL) agent, the model output will not resemble the correct label, as shown in Equation 2.

위 [수학식 2]에서 n×n은 적대적 패치의 크기(예컨대, 3×3 등)이다. 보행 에너지 이미지(GEI)도 그레이스케일(grayscale) 이미지이기 때문에 패치의 픽셀 값은 그레이스케일 값 범위에서 랜덤하게 생성된다. 제안된 적대적 공격에서 에이전트의 주요 목표는 기본 딥러닝(DL) 모델이 보행 에너지 이미지(GEI)를 잘못 분류하도록 하는 추가된 노이즈(즉, 적대적 패치)를 찾는 것이다. 추가된 패치는 전체 보행 에너지 이미지(GEI)의 몇 픽셀을 교란시킨다. 따라서, 육안으로는 이미지의 변화가 덜 눈에 뛴다. 결과적으로, 노이즈는 결국 이미지에 숨겨진다.In [Equation 2] above, n×n is the size of the hostile patch (e.g., 3×3, etc.). Since the Gait Energy Image (GEI) is also a grayscale image, the pixel value of the patch is randomly generated within the grayscale value range. In the proposed adversarial attack, the main goal of the agent is to find added noise (i.e., adversarial patches) that cause the underlying deep learning (DL) model to misclassify the Gait Energy Image (GEI). The added patch disturbs a few pixels of the overall Gait Energy Image (GEI). Therefore, changes in the image are less noticeable to the naked eye. As a result, noise ends up hidden in the image.

4-2. Q-러닝4-2. Q-Learning

본 발명에서는 Q-러닝을 이용한 블랙 박스 적대적 공격을 제안한다. 현재 맥락(context)에서, 에이전트 또는 학습자는 의사 결정자이며, 에이전트 또는 학습자 외부의 모든 것은 환경이다. 언제든지 t 스텝(t step)에서, 에이전트는 상태라고 하는 환경에 대한 지식/정보를 관찰하고, 이 현재 상태를 기반으로 액션을 시도한다. 이 액션의 결과, 에이전트는 환경으로부터 보상을 받고 다른 상태로의 전환을 수행한다.In the present invention, we propose a black box adversarial attack using Q-learning. In the current context, the agent or learner is the decision maker, and everything outside the agent or learner is the environment. At any time t step, the agent observes knowledge/information about the environment, called the state, and attempts an action based on this current state. As a result of this action, the agent receives a reward from the environment and performs a transition to another state.

보다 기술적으로, 강화 학습(RL)의 문제는 종종 튜플(tuple)(S,A,R,ρ,γ로 마르코브 결정 프로세스(Markov decision process)로 표현된다. 여기서, S는 가능한 모든 상태의 목록을 나타내고, A는 에이전트가 수행할 수 있는 액션 집합이며, R은 보상이다. ρ는 전이 확률(transition probability)을 나타내고, γ는 할인 팩터(discount factor)를 나타낸다. 에이전트의 주요 목표는 [수학식 3] ~ [수학식 5]와 같이, 상태 s∈S에서 액션 a∈A를 수행하여 총 기대된 및 할인된 누적 보상을 증가시키는 최적의 정책(policy) (π│a))을 결정하는 것이다.More technically, problems in reinforcement learning (RL) are often expressed as a Markov decision process as a tuple (S,A,R,ρ,γ), where S is the list of all possible states. , A is a set of actions that the agent can perform, and R is the reward. ρ represents the transition probability, and γ represents the discount factor. The main goal of the agent is [Equation 3] ~ As shown in [Equation 5], the optimal policy (policy) (π│a)) that increases the total expected and discounted cumulative reward by performing action a∈A in state s∈S is determined. .

[수학식 4]에서 t 스텝에서 얻은 즉시 보상(immediate reward)은 R(τ)로 표시된다. 강화 학습(RL)의 핵심 구성 요소 중 하나는 수행되는 액션의 확률을 생성하는 π로 표시되는 정책이다. 이러한 강화 학습(RL) 알고리즘은 이 정책에 따라, 온 정책(on-policy)과 오프 정책(off-policy) 접근 방식으로 분류된다. 오프 정책 모델이 없는 강화 학습(RL) 알고리즘인 Q-러닝에서 에이전트는 최상의 상태-액션-값 함수(state-action-value function) 를 달성하기 위해 환경과 상호 작용한다. 이 방법은 각 상태-액션 쌍(state-action pair)에 대한 Q-값을 보유하는 Q-테이블(Q-table)(상태-액션 테이블이라고도 함)을 유지 관리한다. 이 Q-테이블은 처음에 0 값으로 초기화되고 나중에 이 테이블은 [수학식 6]과 같은 시간차 방법(temporal difference method)을 사용하여 업데이트된다.In [Equation 4], the immediate reward obtained at step t is expressed as R(τ). One of the key components of reinforcement learning (RL) is a policy, denoted as π, that generates the probability of an action being performed. These reinforcement learning (RL) algorithms are classified into on-policy and off-policy approaches according to this policy. In Q-Learning, a reinforcement learning (RL) algorithm without an off-policy model, the agent uses the best state-action-value function. Interact with the environment to achieve. This method maintains a Q-table (also called state-action table) that holds the Q-value for each state-action pair. This Q-table is initially initialized with a value of 0, and later this table is updated using a temporal difference method such as [Equation 6].

[수학식 6]에서 α는 학습률(learning rate)을 나타내고, 할인 팩터(discount factor) γ[0,1]과 같다. 상태-액션 쌍 에 대해 실제 Q-값은 로 표시되는 반면, 상태-액션 쌍 에 대해 목표 Q-값은 (즉, 다음 상태의 Q-값이 할인된 즉각적인 보상)이다. 모든 상태-액션 쌍에 대해 이 테이블은 연속적인 반복을 통해 적응적으로 업데이트된다. ε-greedy 방법은 Q-테이블의 신속한 수렴을 위해 종종 사용된다. 일반적으로 이 접근 방식에서 Q-값은 0으로 초기화되며 이는 상태에 대해 가능한 모든 액션이 선택될 가능성이 동일함을 나타낸다. 결과적으로, 탐색-이용 트레이드 오프(exploration-exploitation trade-off)가 반복을 통해 Q-테이블을 수렴하는 데 사용된다. 탐색 프로세스(exploration process)는 랜덤 액션을 선택하여 상태-액션 쌍의 Q-값을 무작위로 업데이트한다. 이용(exploitation)은 주어진 상태에 대한 보상이 가장 높은 Q 테이블에서 그리디 액션(greedy action)을 선택한다. 특히 이 문제에 대한 이 알고리즘의 단계별 공식은 아래에 설명되어 있다.In [Equation 6], α represents the learning rate and is equal to the discount factor γ[0,1]. State-Action Pair The actual Q-value for While the state-action pair is denoted by The target Q-value for is (i.e., immediate reward with the Q-value of the next state discounted). For every state-action pair, this table is updated adaptively through successive iterations. The ε-greedy method is often used for rapid convergence of Q-tables. Typically in this approach the Q-value is initialized to 0, which indicates that all possible actions for a state are equally likely to be selected. As a result, an exploration-exploitation trade-off is used to converge the Q-table through iterations. The exploration process selects a random action and randomly updates the Q-value of the state-action pair. Exploitation selects the greedy action from the Q table that has the highest reward for a given state. The step-by-step formulation of this algorithm, especially for this problem, is described below.

4-2-1. 환경 모델링4-2-1. environmental modeling

적대적 공격을 성공적으로 수행하기 위해, 에이전트는 환경과 상호 작용을 시도한다. 따라서, 이 문제에서 에이전트가 상호 작용하는 환경은 특정 대상의 2차원(2D) 보행 에너지 이미지(GEI)이다. 일반적으로 이미지는 이미지의 차원과 동일한 크기의 2D 그리드(grid)로 간주할 수 있다. 초기에 에이전트는 환경(이미지)에 대한 정보가 충분하지 않고, 적대적 공격을 수행하기 위해 이미지의 픽셀 값을 조작하는 등 환경과 상호 작용하여 환경에서 정보를 얻는다.To successfully carry out an adversarial attack, the agent attempts to interact with the environment. Therefore, in this problem the environment with which the agent interacts is a two-dimensional (2D) gait energy image (GEI) of a specific object. In general, an image can be considered a 2D grid with the same size as the image dimension. Initially, the agent does not have enough information about the environment (image), and obtains information from the environment by interacting with the environment, such as manipulating the pixel values of the image to perform adversarial attacks.

4-2-2. 상태 공간4-2-2. state space

도 7은 본 발명의 바람직한 실시예에 따른 에이전트와 환경의 상호 작용을 설명하기 위한 도면이다.Figure 7 is a diagram illustrating the interaction between an agent and the environment according to a preferred embodiment of the present invention.

상태 공간에는 에이전트가 환경을 탐색하고 이용할 수 있는 가능한 상태(위치) 집합이 포함된다. 제안된 문제의 맥락에서, 상태는 특정 대상의 보행 에너지 이미지(GEI)에서 픽셀 위치이다. 총 상태 수는 보행 에너지 이미지(GEI)의 픽셀 위치 수와 같다. 에이전트는 임의의 픽셀 위치로 이동하여 이를 조작하고 적대적 샘플을 만들 수 있다. 에이전트의 상태 전환은 도 7에 도시된 그림에 나와 있다.The state space contains the set of possible states (positions) in which the agent can explore and exploit the environment. In the context of the proposed problem, a state is a pixel position in the Gait Energy Image (GEI) of a particular subject. The total number of states is equal to the number of pixel positions in the Gait Energy Image (GEI). The agent can move to arbitrary pixel locations, manipulate them, and create adversarial samples. The state transitions of the agent are shown in the figure shown in Figure 7.

4-2-3. 액션4-2-3. action

액션에는 에이전트가 특정 상태에서 취하는 모든 가능한 이동이 포함된다. 이 문제의 경우, 에이전트는 보행 에너지 이미지(GEI)의 모든 픽셀 위치에서 네 가지 가능한 작업을 수행할 수 있다. 이러한 작업에는 "위", "아래", "왼쪽" 및 "오른쪽"이 포함된다.Actions include all possible moves that an agent takes in a particular state. For this problem, the agent can perform four possible actions at any pixel location in the Gait Energy Image (GEI): These operations include “up,” “down,” “left,” and “right.”

4-2-4. 에이전트 목표 및 보상4-2-4. Agent Goals and Rewards

도 8은 본 발명의 바람직한 실시예에 따른 Q-러닝을 이용한 적대적 공격의 알고리즘을 설명하기 위한 도면이다.Figure 8 is a diagram illustrating an algorithm for an adversarial attack using Q-learning according to a preferred embodiment of the present invention.

보상은 에이전트가 환경에서 액션을 취할 때 에이전트에게 할당되는 수치이다. 이 보상은 에이전트가 취하는 액션의 품질을 나타낸다. 이 문제에 대해, +10과 -10의 두 가지 보상 값을 설정했다. 에이전트가 특정 위치에 적대적 패치를 추가하면, 패치 기반 적대적 교란(patch-based adversarial perturbation)을 추가한 후 보행 에너지 이미지(GEI)의 레이블을 예측하여 이 액션의 품질을 계산할 수 있다. 추가된 교란이 실제 보행 에너지 이미지(GEI) 클래스에 대한 모델의 확률 또는 신뢰도를 낮추면 에이전트에 +10 보상 값이 할당된다. 그렇지 않으면, 보상은 -10이다. 값이 -10인 보상은 강화 학습(RL) 에이전트에 대한 페널티라고도 한다. 더 기술적으로, 차원 을 가진 특정 대상의 보행 에너지 이미지(GEI)를 취하고 보행 에너지 이미지(GEI)를 신뢰 확률(confidence probability) ρ로 올바른 레이블 t로 분류하는 기본 대상 모델 f를 고려한다. 그러나, 에이전트가 보행 에너지 이미지(GEI)의 특정 상태(픽셀 위치)로 이동하고 적대적 패치로 이 위치를 교란시키는 액션을 취하면, 이 액션은 적대적 패치를 사용하여 교란된 위치가 있는 보행 에너지 이미지(GEI)가 주어진 모델의 확률 ρ를 계산하여 검증할 수 있다. 에이전트는 이 확률에 따라 보상을 받는다. 수학적으로 이 절차는 다음의 [수학식 7]과 같이 정의된다.A reward is a number assigned to an agent when the agent takes an action in the environment. This reward represents the quality of the action taken by the agent. For this problem, we set two compensation values: +10 and -10. When an agent adds an adversarial patch at a specific location, the quality of this action can be calculated by predicting the label of the Gait Energy Image (GEI) after adding patch-based adversarial perturbation. If the added perturbation lowers the probability or confidence of the model for a real Gait Energy Image (GEI) class, the agent is assigned a reward value of +10. Otherwise, the reward is -10. A reward with a value of -10 is also called a penalty for reinforcement learning (RL) agents. More technically, dimensions We consider a basic object model f that takes the gait energy image (GEI) of a specific object with , and classifies the gait energy image (GEI) into the correct label t with confidence probability ρ. However, if the agent moves to a certain state (pixel position) in the Gait Energy Image (GEI) and takes an action to perturb this position with an adversarial patch, this action will use the adversarial patch to create a Gait Energy Image (GEI) with the perturbed position. GEI) can be verified by calculating the probability ρ of a given model. The agent receives a reward according to this probability. Mathematically, this procedure is defined as follows [Equation 7].

[수학식 7]에서 각 시간 스텝 t에서 에이전트가 보행 에너지 이미지(GEI) 위치를 교란시키는 액션을 할 때, 액션 품질은 실제 클래스에 대한 모델의 신뢰도 를 계산하고 이를 교란이 추가된 후의 신뢰도 ρ와 비교하여 결정된다. 에이전트 액션 결과에 추가된 교란(즉, 적대적 패치)이 실제 클래스에 대한 모델 신뢰도를 낮추는 경우 모델이 수행한 액션은 양호하며 +10점을 받는다. 그러나, 강화 학습(RL) 에이전트가 취한 액션이 실제 클래스에 대한 기본 모델의 확률 감소로 이어지지 않으면 액션이 좋지 않으며 모델에 페널티(즉, 도 7에 도시된 그림과 같이, 음수 보상 값 -10)가 부여된다. 이와 같이 에이전트는 환경을 학습하고, 에이전트가 [수학식 2]에 주어진 원하는 상태에 도달하면 에이전트에 대한 에피소드가 종료되고 알고리즘이 중지된다. 좀 더 명확하게 말하면, 에이전트가 적대적 패치로 교란할 때 모델/시스템이 이미지를 잘못 분류하게 하는 위치에 도달하면 에이전트의 목표에 도달한다. 에이전트가 액션을 취하고 실제 이미지 레이블에 대한 모델의 신뢰도가 얼마나 성공적으로 감소하는지에 따라 보상을 받기 때문에 목표 상태를 달성하는 것은 대화식(interactive)이다. Q-러닝을 사용한 적대적 공격 알고리즘의 단계별 작동 순서는 도 8에 도시된 알고리즘 1에 설명되어 있다.In [Equation 7], at each time step t, when the agent takes an action that disturbs the Gait Energy Image (GEI) location, the action quality is the model's reliability for the actual class. It is determined by calculating and comparing it with the reliability ρ after the disturbance is added. If a perturbation (i.e. an adversarial patch) added to the result of an agent's action lowers the model's confidence in the true class, the action performed by the model is good and receives +10 points. However, if the action taken by the reinforcement learning (RL) agent does not lead to a decrease in the probability of the base model for the true class, then the action is not good and the model is penalized (i.e., a negative reward value of -10, as shown in Figure 7). granted. In this way, the agent learns the environment, and when the agent reaches the desired state given in Equation 2, the episode for the agent ends and the algorithm stops. More specifically, the agent's goal is reached when it reaches a position that causes the model/system to misclassify the image when perturbed by an adversarial patch. Achieving the goal state is interactive because agents take actions and are rewarded based on how successfully they reduce the model's confidence in the actual image labels. The step-by-step operating sequence of the adversarial attack algorithm using Q-learning is described in Algorithm 1 shown in Figure 8.

보다 정확하게는, 알고리즘 1에서 알 수 있듯이, 알고리즘 1의 입력은 를 속이기 위한 기본 모델인 보행 에너지 이미지(GEI) X와 값이 0.9인 Q-러닝 알고리즘 파라미터(즉, 학습률 및 할인 팩터)이다. 알고리즘 1의 출력은 에이전트가 도달한 목표 상태 또는 적대적 패치가 추가된 위치이다. 알고리즘 1은 고정된 수의 에피소드 및 반복에 대해 모든 보행 에너지 이미지(GEI)에 대해 실행된다. 보다 정확하게는, 3행에 주어진 것처럼 먼저 에피소드 수와 에피소드당 반복(스텝) 수를 포함하는 하이퍼파라미터를 상태 및 액션 크기로 초기화한다. 그런 다음, 각 보행 에너지 이미지(GEI)(즉, [수학식 1]을 사용하여 계산된 다른 대상의 보행 에너지 이미지(GEI))에 대해 일련의 에피소드가 실행된다. 주어진 모든 상태에서 에이전트는 상태 전환 및 보상(9, 10, 11행)을 초래하는 액션(8행)을 취한다. 상태는 에이전트가 8행에 있는 보행 에너지 이미지(GEI)의 픽셀 위치일 뿐이며, 이후 [수학식 2]에 표시된 대로 픽셀 위치에 n×n 크기의 패치가 추가된다. 에이전트는 패치를 추가하고 왼쪽, 오른쪽, 위, 아래 상태 간 전환을 수행한 후 [수학식 7]에 의해 계산된 보상을 받는다. 나중에 Q-테이블의 값은 γ가 할인 팩터이고 α가 학습률인 [수학식 6]의 시간차를 계산하여 업데이트된다. 이 테이블은 각 상태-액션 쌍에 대한 후속 반복에 의해 적응적으로 업데이트된다. 이 전체 알고리즘 1은 설정된 에피소드 수에 대해 반복된다.More precisely, as can be seen from Algorithm 1, the input of Algorithm 1 is The basic model to deceive is the Gait Energy Image (GEI) The output of Algorithm 1 is the goal state reached by the agent or the location where the adversarial patch was added. Algorithm 1 is run on all Gait Energy Images (GEI) for a fixed number of episodes and repetitions. More precisely, as given in line 3, we first initialize the hyperparameters including the number of episodes and the number of iterations (steps) per episode with the state and action sizes. Then, a series of episodes are executed for each gait energy image (GEI) (i.e., the gait energy image (GEI) of a different subject calculated using Equation 1). For every given state, the agent takes an action (line 8) that results in a state transition and a reward (lines 9, 10, and 11). The state is just the pixel position of the Gait Energy Image (GEI) where the agent is in row 8, and then a patch of size n × n is added to the pixel position as shown in [Equation 2]. The agent adds a patch, performs transitions between left, right, up, and down states, and then receives a reward calculated by [Equation 7]. Later, the value of the Q-table is updated by calculating the time difference in [Equation 6] where γ is the discount factor and α is the learning rate. This table is adaptively updated by subsequent iterations for each state-action pair. This entire Algorithm 1 is repeated for a set number of episodes.

본 발명의 실험 결과Experimental results of the present invention

이 섹션에서는 본 발명을 통해 제안된 적대적 공격의 결과를 설명한다. 그러나, 적대적 공격의 결과를 살펴보기 전에 시스템이 공격을 받지 않는 동안 딥러닝(DL) 기반 보행 인식 시스템의 성능을 평가한다. 그 결과, 첫 번째 소절에서는 보행 인식 시스템의 결과를 상세히 기술하고, 두 번째 소절에서는 적대적 공격의 결과를 제시한다. 보다 정확하게는, 자율 보행 인식 시스템의 의사 결정 능력에 대해 자세히 설명한다. 또한, 평가 메트릭(metric)은 아래에 정의되어 있다.This section describes the results of the adversarial attack proposed through the present invention. However, before examining the results of an adversarial attack, we evaluate the performance of a deep learning (DL)-based gait recognition system while the system is not under attack. As a result, the first subsection describes the results of the gait recognition system in detail, and the second section presents the results of the adversarial attack. More precisely, the decision-making capabilities of the autonomous gait recognition system are discussed in detail. Additionally, evaluation metrics are defined below.

1. 정확도1. Accuracy

정확도는 분류 모델의 성능을 평가하는 데 널리 사용되는 메트릭이다. 특히, 정확도는 훈련 데이터의 모든 대상 중에서 얼마나 많은 대상의 ID가 올바르게 예측되었는지 측정한다. 수학적으로는 [수학식 8]로 정의된다.Accuracy is a widely used metric to evaluate the performance of classification models. In particular, accuracy measures how many object IDs were correctly predicted among all objects in the training data. Mathematically, it is defined as [Equation 8].

2. 성공률2. Success rate

성공률은 강화 학습(RL) 에이전트가 기본 모델 또는 시스템의 정확도를 속이거나 줄이는 데 얼마나 성공했는지 나타낸다. 모델 f의 실제 정확도와 반대이다. 수학적으로 성공률은 [수학식 9]와 같이 계산된다.Success rate indicates how successful a reinforcement learning (RL) agent is in fooling or reducing the accuracy of the underlying model or system. This is the opposite of the actual accuracy of model f. Mathematically, the success rate is calculated as [Equation 9].

3. 심층 합성곱 신경망(DCNN)의 결과3. Results of deep convolutional neural network (DCNN)

도 9는 본 발명의 바람직한 실시예에 따른 보행 인식 모델의 결과를 나타내는 표이다.Figure 9 is a table showing the results of a gait recognition model according to a preferred embodiment of the present invention.

심층 합성곱 신경망(DCNN) 기반 보행 인식 시스템의 성능을 평가하기 위해 보행 에너지 이미지(GEI)라는 보행 표현 형태의 다양한 피험자의 데이터가 훈련용 모델의 입력으로 제공된다. 데이터 섹션에 표시된 대로 CASIA-B 보행 데이터베이스에는 124명의 피험자의 데이터가 포함되어 있으며, 각 피험자는 10개의 보행 순서와 3개의 다른 보행 조건을 가지고 있다. 이러한 보행 조건에는 정상 보행(nm), 가방을 메고 보행(bg), 다양한 코트를 입고 보행(cl)이 포함된다. 본 발명에서는 각각의 보행 조건을 개별적으로 다룬다. 개인별 걷기 영상 10개 중 6개는 일반 걷기 영상이다. 이러한 각 개인의 일반적인 보행 영상은 갤러리 세트(gallery set)와 프로브 세트(probe set)로 나뉘는데, 갤러리 세트에는 nm-01~nm-04가 유지되고, 나머지 nm-05~nm-06은 프로브 세트에 유지된다. 이 실험 설정에서 모델은 97.98% 정확도를 나타낸다.To evaluate the performance of a deep convolutional neural network (DCNN)-based gait recognition system, data from various subjects in the form of gait representations called gait energy images (GEI) are provided as input to the training model. As shown in the data section, the CASIA-B gait database contains data from 124 subjects, each subject with 10 gait sequences and 3 different gait conditions. These walking conditions include normal walking (nm), walking while carrying a bag (bg), and walking while wearing various coats (cl). In the present invention, each walking condition is dealt with individually. Out of 10 individual walking videos, 6 are general walking videos. The typical walking video of each individual is divided into a gallery set and a probe set. nm-01 to nm-04 are kept in the gallery set, and the remaining nm-05 to nm-06 are kept in the probe set. maintain. In this experimental setup, the model achieves 97.98% accuracy.

마찬가지로, 두 번째 실험에서는 일반적인 걷기 동영상과 다양한 종류의 가방을 들고 다니는 동영상을 결합했다. 이 설정에서는 대상별 영상(nm-01~nm-03)과 하나의 가방 영상 시퀀스(bg-01)는 갤러리 세트로 유지하고 나머지 영상(nm-04, bg-02)은 프로브 세트에서 사용한다. 이 실험의 심층 합성곱 신경망(DCNN) 모델의 결과는 97.50%이다(도 9에 도시된 표의 두 번째 행). 세 번째 보행 조건에서는 피험자가 다양한 종류의 코트를 입고 보행한다. 이 시나리오에서는 갤러리 세트에 각 대상(nm-01 ~ nm-03 with cl-01)의 일반 걷기 비디오 및 코트 기반 비디오가 유지되고 nm-04 및 cl-02 비디오가 프로브 세트와 함께 유지된다. 실험 설정의 세부 사항과 정확도의 결과는 도 9에 도시된 표에 나와 있다. 이 실험의 분석 및 결과에서 모델은 걸음걸이를 기반으로 개인을 식별하는 데 놀라운 정확도로 지능적인 결정을 내린다.Similarly, in the second experiment, videos of regular walking were combined with videos of people carrying different types of bags. In this setting, the target images (nm-01~nm-03) and one bag image sequence (bg-01) are maintained in the gallery set, and the remaining images (nm-04, bg-02) are used in the probe set. The result of the deep convolutional neural network (DCNN) model in this experiment is 97.50% (second row of the table shown in Figure 9). In the third walking condition, subjects walked wearing various types of coats. In this scenario, the normal walking video and coat-based video of each subject (nm-01 to nm-03 with cl-01) are kept in the gallery set, and the nm-04 and cl-02 videos are kept with the probe set. Details of the experimental setup and accuracy results are presented in the table shown in Figure 9. The analysis and results of this experiment show that the model makes intelligent decisions with remarkable accuracy in identifying individuals based on their gait.

4. Q-러닝을 통한 적대적 공격의 결과4. Results of adversarial attack through Q-learning

도 10은 본 발명의 바람직한 실시예에 따른 적대적 공격 하에서의 보행 인식 모델의 결과를 나타내는 표이다.Figure 10 is a table showing the results of a gait recognition model under a hostile attack according to a preferred embodiment of the present invention.

결과는 딥러닝(DL) 보조 보행 기반 비디오 감시 시스템이 정확도 측면에서 상당히 잘 작동함을 나타낸다. 그러나, 공격자가 적대적 공격을 통해 감시 시스템에 액세스하고 모델의 원하는 성능을 방해한다면 어떻게 될까요? 적대적 공격 하에서 이 모델의 의사 결정 능력은 아래에 설명되어 있다.The results indicate that the deep learning (DL) assisted walking-based video surveillance system performs quite well in terms of accuracy. However, what happens if an attacker gains access to the surveillance system through an adversarial attack and disrupts the desired performance of the model? The decision-making ability of this model under adversarial attack is described below.

이에 답하기 위해, 공격자가 기본 모델 파라미터에 액세스할 수 없는 적대적 공격을 설계한다. 이 공격은 모델이 이미지를 잘못 분류하도록 피사체의 보행 에너지 이미지(GEI)를 교란시키는 것을 목표로 하는 강화 학습(RL) 기반 지능형 에이전트를 통해 수행할 수 있다. 처음에, 에이전트는 환경 역할을 하는 보행 에너지 이미지(GEI)에 대한 정보가 없다. 액션을 취하고, 액션 품질은 쿼리 보행 에너지 이미지(GEI)를 모델에 전송하여 결정된다. 에이전트는 모델 예측에 따라 수치 보상 값을 받는다. 이 설정은 공격자가 대상 모델에 쿼리 이미지를 전달할 수 있음을 보여준다. 모델 예측은 강력한 적대적 공격을 생성하는 데 도움이 된다. 실험 설정에서, 본 발명은 먼저 피험자의 일반적인 보행 시퀀스에 대해 모델을 훈련했다. 이러한 훈련된 모델의 예측은 적대적 공격을 수행하는 데 사용된다. 본 발명은 데이터를 갤러리 세트와 일반적인 걷기 비디오의 프로브 세트로 나누었다. 나중에, 모델은 갤러리 세트에서 훈련되었다. 그 후, 프로브 세트의 각 보행 에너지 이미지(GEI)에 대해, 보행 에너지 이미지(GEI) 및 모델의 예측과 상호 작용하여 그레이스케일 픽셀의 랜덤 값을 사용하여 생성된 적대적 패치의 최적 위치를 결정하는, 강화 학습(RL) 기반 에이전트를 사용한다. 최적 위치는 추가된 패치로 인해 모델이 이미지를 잘못 분류하는 위치이다. 이 패치는 에이전트의 모든 에피소드에 대해 각 반복에서 랜덤하게 계산된다. 알고리즘은 각각 20회 반복되는 10개의 에피소드에 대해 실행된다. 각 반복에서 에이전트는 ε정책을 사용하여 액션을 취한다. 처음에는 0과 1 사이의 랜덤 숫자가 생성된다.To answer this, we design an adversarial attack in which the attacker does not have access to the underlying model parameters. This attack can be performed through reinforcement learning (RL)-based intelligent agents that aim to perturb the subject's gait energy image (GEI) to cause the model to misclassify the image. Initially, the agent has no information about the Gait Energy Image (GEI) that serves as the environment. An action is taken, and the quality of the action is determined by sending a query Gait Energy Image (GEI) to the model. The agent receives a numerical reward value based on model predictions. This setup shows that an attacker can pass a query image to the target model. Model predictions help create powerful adversarial attacks. In the experimental setup, we first trained the model on subjects' typical gait sequences. Predictions from these trained models are used to perform adversarial attacks. The present invention divided the data into a gallery set and a probe set of general walking videos. Later, the model was trained on the gallery set. Then, for each GEI in the probe set, determine the optimal location of the generated adversarial patch using random values of grayscale pixels, interacting with the GEI and the model's predictions. It uses reinforcement learning (RL) based agents. The optimal location is where the model misclassifies the image due to the added patch. This patch is computed randomly at each iteration for every episode of the agent. The algorithm is run for 10 episodes, each repeated 20 times. In each iteration, the agent takes action using the ε policy. Initially, a random number between 0 and 1 is generated.

이 랜덤 숫자가 ε보다 작으면, 에이전트는 Q-테이블에 따라 최선의 액션을 선택한다. 그렇지 않으면, 가능한 네 가지 액션 중에서 랜덤 액션을 수행한다. "위" 액션은 에이전트를 보행 에너지 이미지(GEI)에서 한 픽셀 위로 이동한다. 마찬가지로, "아래로" 액션은 에이전트를 보행 에너지 이미지(GEI)에서 한 픽셀 아래로 이동한다. 또한 "왼쪽" 및 "오른쪽" 액션은 에이전트를 보행 에너지 이미지(GEI)에서 왼쪽 및 오른쪽으로 한 픽셀 이동한다. 패치의 중앙 픽셀은 에이전트의 위치를 나타내고, 이웃 픽셀은 에이전트 경계이며 교란을 나타낸다. 프로브 데이터 세트를 3개의 다른 세트로 나누었다. 첫 번째 세트에는 처음 25명의 피험자에 대한 데이터가 포함되어 있고, 유사하게, 두 번째 세트에서는 피험자 데이터를 50개까지 늘렸지만, 마지막 실험에서는 124명의 피험자에 대한 데이터를 모두 사용하였다. 적대적 공격의 결과를 평가하기 위해, 프로브 세트의 처음 25개 대상에 대한 데이터에서 에이전트를 평가했다.If this random number is less than ε, the agent chooses the best action according to the Q-table. Otherwise, a random action is performed from among the four possible actions. The "Up" action moves the agent one pixel up in the Gait Energy Image (GEI). Similarly, the "down" action moves the agent one pixel down in the Gait Energy Image (GEI). Additionally, the "Left" and "Right" actions move the agent one pixel left and right in the Gait Energy Image (GEI). The central pixel of the patch represents the agent's location, and the neighboring pixels are agent boundaries and represent disturbances. The probe data set was divided into three different sets. The first set contained data for the first 25 subjects, and similarly, the second set increased the number of subjects to 50, but the final experiment used data for all 124 subjects. To evaluate the consequences of an adversarial attack, we evaluated the agent on data for the first 25 targets in the probe set.

일반적인 보행 조건 하에서의 적대적 공격의 결과는 25명의 대상으로 구성된 프로브 세트에 대한 도 10에 도시된 표의 1행에 제시되어 있다. 일반적인 보행 환경에서 97.98%의 정확도를 보이는 보행 인식 알고리즘은 적의 공격을 받으면 비효율적이 되고 의사 결정 능력은 32%로 떨어진다. 유사하게, 적대적 공격의 성능은 두 가지 다른 보행 조건(즉, 가방과 코트를 입고 일반적인 보행)에서도 평가되었다. 프로브 세트에는 일반 및 가방 설정에 대한 일반 및 가방 기반 보행 조건이 모두 포함되어 있다. 이 실험 설정에서, 적대적 공격의 결과는 25명의 대상이 있는 프로브 세트에 대한 도 10에 도시된 표의 두 번째 행에 설명되어 있다. 그 후, 세 번째 시나리오에서, 모델은 다른 대상의 일반 및 코트 비디오에 대해 훈련되며 프로브 세트도 별개의 일반 및 코트 비디오로 구성된다. 이 실험 시나리오 하에서, 적대적 공격의 결과는 25명의 대상이 있는 프로브 세트에 대한 도 10에 도시된 표의 세 번째 행에 제공된다. 유사하게, 다음 실험에서, 본 발명은 세 가지 보행 조건 모두에 대해 프로브 세트의 처음 50개 대상에 대한 데이터를 축적했다. 이 설정에서는 모델 정확도가 크게 떨어졌다. 특히, 일반 보행 시, 정확도 하락이 97.98%에서 62%로 낮아져 공격 성공률이 최대 35.98%에 이른다. 성공률은 모델이 잘못 분류한 이미지의 백분율이다. 마찬가지로, 가방에 대한 프로브 세트를 사용하면 모델 정확도가 61%로 떨어지는 반면, 코트가 있는 프로브 세트의 결과는 30%로 떨어진다. 마지막 실험에서는, 데이터베이스에서 사용할 수 있는 124개 대상의 데이터를 모두 사용했다. 이 경우, 모든 보행 조건(즉, 일반, 가방 포함, 코트 포함)에서 모델의 최종 정확도 값은 각각 46.37%, 30.89% 및 58.13%이다. 위의 실험은 피험자가 다른 코트 유형을 착용할 때 정확도가 더 크게 떨어짐을 보여준다. 또한, 이러한 실험은 강화 학습(RL) 에이전트가 한 스텝으로 이동(즉, 한 상태에서 다른 상태로 전환되고 에이전트가 한 스텝("위", "아래", "왼쪽", 또는 "오른쪽")으로 이동)할 때 실행된다.The results of the adversarial attack under typical walking conditions are presented in row 1 of the table shown in Figure 10 for a probe set consisting of 25 subjects. The gait recognition algorithm, which shows 97.98% accuracy in a typical walking environment, becomes inefficient when attacked by an enemy, and its decision-making ability drops to 32%. Similarly, the performance of adversarial attacks was also evaluated in two different walking conditions (i.e., normal walking while wearing a bag and a coat). The probe set includes both normal and bag-based walking conditions for the normal and bag settings. In this experimental setup, the results of the adversarial attack are depicted in the second row of the table shown in Figure 10 for a probe set with 25 subjects. Then, in the third scenario, the model is trained on normal and court videos of different subjects and the probe set also consists of separate normal and court videos. Under this experimental scenario, the results of the adversarial attack are provided in the third row of the table shown in Figure 10 for a probe set with 25 subjects. Similarly, in the following experiments, we accumulated data for the first 50 objects of the probe set for all three walking conditions. In this setting, model accuracy dropped significantly. In particular, when walking normally, the accuracy drop is lowered from 97.98% to 62%, and the attack success rate reaches up to 35.98%. The success rate is the percentage of images that the model misclassified. Similarly, using the probe set for bags, the model accuracy drops to 61%, while the results for the probe set for coats drop to 30%. In the final experiment, data from all 124 subjects available in the database were used. In this case, the final accuracy values of the model across all walking conditions (i.e. normal, with bag, with coat) are 46.37%, 30.89%, and 58.13%, respectively. The above experiment shows that accuracy drops more significantly when subjects wear different coat types. Additionally, these experiments show that a reinforcement learning (RL) agent moves in one step (i.e., transitions from one state to another, and the agent takes a step (“up”, “down”, “left”, or “right”)). It is executed when moving).

5. 적에 대한 신뢰도(confidence in adversaries)5. Confidence in adversaries

도 11은 본 발명의 바람직한 실시예에 따른 세 가지 보행 조건이 모두 포함된 프로브 세트의 각 보행 에너지 이미지에 대한 신뢰도 값의 밀도 플롯을 나타내는 도면이다.Figure 11 is a diagram showing a density plot of reliability values for each walking energy image of a probe set including all three walking conditions according to a preferred embodiment of the present invention.

적대적 이미지를 예측하는 동안의 모델 확률도 적대적 공격에 대한 성능 측정으로 간주된다. 모든 크기의 프로브 세트를 사용하여 적대적 보행 에너지 이미지(GEI)에 대한 모델 신뢰도를 계산했다. 또한, 모델이 잘못된 클래스 레이블을 표시하는 각 적대적 보행 에너지 이미지(GEI)에 대한 지능형 모델의 신뢰도 값을 결정해야 하며, 밀도 플롯(density plot)은 도 11에 도시된 그림에 묘사되어 있다.The model probability while predicting an adversarial image is also considered a performance measure against adversarial attacks. We calculated model confidence for adversarial gait energy imagery (GEI) using probe sets of all sizes. Additionally, the confidence value of the intelligent model must be determined for each adversarial GEI for which the model displays an incorrect class label, the density plot of which is depicted in the figure shown in Figure 11.

도 11에 도시된 그림의 첫 번째 그래프는 50명의 대상으로 구성된 프로브 세트를 사용하여 모든 보행 조건에서 각 보행 에너지 이미지(GEI)에 대한 신뢰도 값을 표시한다. 또한, 밀도는 보행 에너지 이미지(GEI)의 경우 더 높거나 최고점에 있으며 신뢰도는 약 60%이다. 마찬가지로, 두 번째 그래프는 처음 50명의 대상을 포함하는 프로브 세트에 대한 신뢰도 값을 나타낸다. 또한, 일반 보행 에너지 이미지(GEI)는 이 모델에 대해 약 60%의 신뢰도 값을 갖는다. 유사하게, 세 번째 그래프는 124명의 대상을 모두 포함하는 프로브 세트를 사용하여 모든 보행 에너지 이미지(GEI)에 대한 신뢰도 값을 나타낸다. 모든 보행 조건에서, 제안된 적대적 공격은 보행 기반 비디오 감시 시스템의 성능을 감소시키기 위해 좋은 신뢰도 점수로 매우 효율적으로 작동한다.The first graph in the figure shown in Figure 11 displays confidence values for each gait energy image (GEI) across all walking conditions using a probe set of 50 subjects. Additionally, the density is higher or peaks for Gait Energy Imagery (GEI) and the confidence level is around 60%. Similarly, the second graph shows confidence values for the probe set containing the first 50 subjects. Additionally, the Generic Gait Energy Image (GEI) has a confidence value of approximately 60% for this model. Similarly, the third graph shows confidence values for all gait energy images (GEIs) using a probe set containing all 124 subjects. In all walking conditions, the proposed adversarial attack works very efficiently with good confidence scores to reduce the performance of walking-based video surveillance systems.

6. 강화 학습(RL) 에이전트의 점프(jump)6. Jump of reinforcement learning (RL) agent

도 12는 본 발명의 바람직한 실시예에 따른 적대적 공격 하에서의 스텝의 크기를 달리한 보행 인식 모델의 결과를 나타내는 표이다.Figure 12 is a table showing the results of a gait recognition model with different step sizes under a hostile attack according to a preferred embodiment of the present invention.

보행 에너지 이미지(GEI)에서 최적의 위치를 결정하려는 강화 학습(RL) 기반 지능형 에이전트는 다른 설정을 사용하여 조정된다. 도 10에 도시된 표에 제시된 결과에서, 강화 학습(RL) 에이전트로 수행된 적대적 공격은 매우 고무적인 결과를 보여준다. 그러나, 위의 실험에서 에이전트는 보행 에너지 이미지(GEI)에서 모델링한 환경에서 움직이기 위한 액션을 취한다. 보다 정확하게는, 에이전트가 액션을 취할 때 이 액션의 결과는 에이전트를 한 픽셀 위치에서 다른 픽셀 위치로 단 한 픽셀의 스텝으로 이동할 수 있다. 또한, 강화 학습(RL) 에이전트가 액션을 취할 때 더 큰 스텝을 취할 때 적대적 공격의 성능을 검증했다. 이 실험 설정에 대한 세 가지 보행 조건 모두에 대한 결과를 평가했다. 첫 번째 시나리오에서는 갤러리 세트와 프로브 세트를 유지하기 위해 일반적인 보행 조건을 사용했다. 심층 합성곱 신경망(DCNN) 모델은 개인의 일반적인 걷기 비디오가 포함된 갤러리 세트에서 훈련되었다.Reinforcement learning (RL)-based intelligent agents that attempt to determine the optimal location in a gait energy image (GEI) are tuned using different settings. From the results presented in the table shown in Figure 10, the adversarial attack performed with a reinforcement learning (RL) agent shows very encouraging results. However, in the above experiment, the agent takes actions to move in an environment modeled in a Gait Energy Image (GEI). More precisely, when an agent takes an action, the result of this action can move the agent from one pixel location to another in steps of just one pixel. Additionally, we verified the performance of adversarial attacks when the reinforcement learning (RL) agent takes larger steps when taking actions. Results for all three walking conditions for this experimental setup were evaluated. In the first scenario, normal walking conditions were used to maintain the gallery set and probe set. A deep convolutional neural network (DCNN) model was trained on a gallery set containing typical walking videos of individuals.

나중에, 테스트하는 동안, 다른 대상의 보행 에너지 이미지(GEI)를 포함하는 프로브 세트는 강화 학습(RL) 에이전트가 지정한 특정 위치에서 랜덤 픽셀로 교란되었다. 도 12에 도시된 표의 1행은 처음 25명의 대상을 포함하는 일반 보행 프로브 세트를 사용한 적대적 공격의 결과를 나열한다. 이 경우, 에이전트가 환경과 상호 작용하는 동안 에이전트는 상태 전환을 초래하는 액션을 취한다. 이 시나리오에서, 강화 학습(RL) 에이전트는 세 가지 이동 단계로 전환을 수행할 수 있다. 모든 위치(x,y)에 대해, 액션이 "위"이면, 강화 학습(RL) 에이전트는 3 스텝 위로 이동하여 (x-3,y)가 된다. 이 경우, 25명의 피험자가 포함된 프로브 세트에 대해 도 12에 도시된 표의 1행에 나열된 대로 일반적인 보행의 정확도 감소는 56%이다. 또한, 피험자가 다양한 모양과 유형의 가방을 들고 걷는 강화 학습(RL) 에이전트에 의해 적을 만드는 동일한 실험이 수행되었다. 이 경우, 정확도 하락은 64%, 공격 성공률은 31.5%이다. 마찬가지로, 코트의 경우 공격 성공률이 73.5%이다. 다음 단계에서는 처음 50명의 대상에 대한 데이터를 사용하여 해당 적을 생성했다.Later, during testing, probe sets containing gait energy images (GEIs) of different subjects were perturbed with random pixels at specific locations specified by a reinforcement learning (RL) agent. Row 1 of the table shown in Figure 12 lists the results of an adversarial attack using a regular walking probe set containing the first 25 subjects. In this case, while the agent is interacting with the environment, the agent takes actions that result in state transitions. In this scenario, a reinforcement learning (RL) agent can perform transitions in three moving steps: For any position (x,y), if the action is "up", the reinforcement learning (RL) agent moves up 3 steps to (x-3,y). In this case, the accuracy reduction for typical walking is 56%, as listed in row 1 of the table shown in Figure 12 for a probe set containing 25 subjects. Additionally, the same experiment was conducted where subjects created enemies by a reinforcement learning (RL) agent walking while carrying bags of different shapes and types. In this case, the accuracy drop is 64% and the attack success rate is 31.5%. Likewise, in the case of the court, the attack success rate is 73.5%. In the next step, data from the first 50 subjects was used to generate the corresponding enemies.

적대적 공격 후 공격 성공률 및 모델 정확도에 관한 모든 보행 조건에 대한 결과는 50명의 대상으로 구성된 프로브 세트에 대해 도 12에 도시된 표에 제공된다. 일반 보행 세트와 가방과 코트 착용 세트의 공격 성공률은 각각 59.98%, 43.5%, 51%로 제시된 공격의 효과를 입증하고 있다. 그러나, 실험은 모든 피험자에 대해서도 반복되었다. 이 외에도 에이전트의 스텝 크기(즉, 5)를 더 늘려 세 가지 웨일링 조건(waling condition)을 모두 평가했다. 에이전트 스텝 크기를 5로 설정하면, (x,y)에서 "up" 액션이 있는 에이전트의 새 위치(상태)는 (x-5,y)이다. 처음 25명의 피험자로만 구성된 프로브 세트의 경우 일반, 코트, 가방의 상태에서 공격 성공률은 45.98%, 63.5%, 51.5%이다. 마찬가지로, 처음 50명의 피험자에 대한 데이터만 포함된 프로브 세트의 경우 공격 성공률은 일반, 코트, 가방 상태에서 각각 36.98%, 44.5%, 51.5%이다. 또한, 세 번째 실험은 데이터베이스에 있는 124명의 피험자 모두를 대상으로 진행되었다. 또한, 점프의 단계를 5로 설정하면 일반 보행, 코트 조건 및 가방 조건에 대한 정확도 값의 하락이 각각 65.32%, 19.91% 및 47.96%이다.Results for all walking conditions regarding attack success rate and model accuracy after adversarial attack are provided in the table shown in Figure 12 for a probe set of 50 subjects. The attack success rates of the normal walking set and the bag and coat wearing set were 59.98%, 43.5%, and 51%, respectively, proving the effectiveness of the proposed attack. However, the experiment was repeated for all subjects. In addition, we further increased the agent's step size (i.e., 5) and evaluated all three waling conditions. If we set the agent step size to 5, the new position (state) of the agent with the "up" action at (x,y) is (x-5,y). For the probe set consisting of only the first 25 subjects, the attack success rates in the normal, coat, and bag conditions were 45.98%, 63.5%, and 51.5%. Similarly, for the probe set containing only data for the first 50 subjects, the attack success rates are 36.98%, 44.5%, and 51.5% for the normal, coat, and bag conditions, respectively. Additionally, the third experiment was conducted on all 124 subjects in the database. Additionally, when the step of the jump is set to 5, the drop in accuracy values for normal walking, court condition, and bag condition is 65.32%, 19.91%, and 47.96%, respectively.

7. 적의 시각화(adversary visualizations)7. Adversary visualizations

도 13은 본 발명의 바람직한 실시예에 따른 클린 보행 에너지 이미지와 적대적 보행 에너지 이미지의 일례를 나타내는 도면이고, 도 14는 본 발명의 바람직한 실시예에 따른 모든 보행 조건의 공격 성공률을 나타내는 도면이며, 도 15는 본 발명의 바람직한 실시예에 따른 한 개인을 위한 수많은 훈련 에피소드의 결과를 나타내는 표이다.Figure 13 is a diagram showing an example of a clean walking energy image and a hostile walking energy image according to a preferred embodiment of the present invention, and Figure 14 is a drawing showing the attack success rate for all walking conditions according to a preferred embodiment of the present invention. 15 is a table showing the results of numerous training episodes for one individual according to a preferred embodiment of the present invention.

제안된 공격은 랜덤하게 생성된 적대적 패치로 실제 보행 에너지 이미지(GEI)를 교란시켜 보행 에너지 이미지(GEI) 형태의 적을 생성한다. 보행 에너지 이미지(GEI)의 크기는 240×240×1이며, 2D 배열(240×240)이다. 클린 보행 에너지 이미지(GEI)가 강화 학습(RL) 에이전트가 제공한 위치의 3×3 적대적 패치로 교란되면, 결과 보행 에너지 이미지(GEI)를 적 또는 적대적 보행 에너지 이미지(GEI)라고 한다. 적대적 보행 에너지 이미지(GEI)와 클린 보행 에너지 이미지(GEI)의 샘플 예는 도 13에 도시된 그림에 나와 있다. 왼쪽 이미지는 피사체의 클린 보행 에너지 이미지(GEI)이고 오른쪽 이미지는 적대적 보행 에너지 이미지(GEI)이다. 빨간색 사각형으로 적대적 이미지의 교란(perturbation)을 강조했다. 클린 보행 에너지 이미지(GEI)와 적대적 보행 에너지 이미지(GEI)는 매우 유사해 보인다(즉, 추가된 교란이 눈에 띄지 않음). 240×240=57,600픽셀의 이미지에서, 적대적 패치의 크기가 3×3이면 9픽셀만 교란된다. 따라서, 9픽셀의 교란(perturbation)만으로는 육안으로 관찰하기 어렵다.The proposed attack generates an adversary in the form of a GEI by disturbing the actual GEI with randomly generated adversarial patches. The size of the Gait Energy Image (GEI) is 240×240×1 and is a 2D array (240×240). When a clean GEI is perturbed with a 3×3 adversarial patch of locations provided by a reinforcement learning (RL) agent, the resulting GEI is called an adversarial or adversarial GEI. Sample examples of adversarial Gait Energy Images (GEI) and clean Gait Energy Images (GEI) are shown in the figure shown in Figure 13. The left image is the subject's clean Gait Energy Image (GEI), and the right image is the hostile Gait Energy Image (GEI). The perturbation of the hostile image is emphasized by the red square. Clean Gait Energy Image (GEI) and Adversarial Gait Energy Image (GEI) look very similar (i.e. no added perturbations are noticeable). In an image of 240×240=57,600 pixels, if the size of the adversarial patch is 3×3, only 9 pixels are perturbed. Therefore, it is difficult to observe with the naked eye only a perturbation of 9 pixels.

따라서, 추가된 노이즈는 보행 에너지 이미지(GEI)에 숨겨져 있다. 이 주장은 제안된 적대적 공격의 또 다른 실현 가능한 가치를 보여 주며, 눈에 띄지 않게 유지하면서 효과적인 결과를 보여준다. 또한, 모든 보행 조건에 대한 모든 실험의 평균 성공률이 도 14에 도시된 그림에 그래픽으로 표시되어 있다. 에이전트의 성능은 수많은 에피소드를 실행하여 액세스했다.Therefore, the added noise is hidden in the Gait Energy Image (GEI). This argument demonstrates another feasible value of the proposed adversarial attack, demonstrating effective results while remaining unobtrusive. Additionally, the average success rate of all experiments for all walking conditions is graphically displayed in the figure shown in Figure 14. The agent's performance was accessed by running numerous episodes.

각 에피소드에 대해 에이전트는 100회 반복을 수행했다. 실험에서 에이전트도 10개의 에피소드로 학습했으며 각 에피소드에 대해 에이전트는 20회 반복을 수행했다. 또한, 이 실험에서 해당 프로브 세트의 세 가지 보행 조건 모두에서 개인 #01의 보행 에너지 이미지(GEI)를 가져왔다. 각 보행 에너지 이미지(GEI)에 대해 에이전트는 1000개의 에피소드를 사용하여 훈련했으며 각 에피소드에 대해 에이전트는 100회 반복을 수행했다. 이 실험 설정의 결과는 도 15에 도시된 표에 나열되어 있으며, 1000개의 훈련 에피소드 후 강화 학습(RL) 에이전트가 테스트 모드로 전환되면, 훈련 시간 동안 구축된 Q-테이블을 기반으로 가장 실현 가능한 액션을 수행함을 보여준다.For each episode, the agent performed 100 iterations. In the experiment, the agent also learned with 10 episodes, and for each episode, the agent performed 20 iterations. Additionally, in this experiment, gait energy images (GEI) of individual #01 were taken from all three walking conditions of the corresponding probe set. For each gait energy image (GEI), the agent was trained using 1000 episodes, and for each episode, the agent performed 100 iterations. The results of this experimental setup are listed in the table shown in Fig. 15, which shows that when the reinforcement learning (RL) agent enters test mode after 1000 training episodes, the most feasible actions are determined based on the Q-table built during the training time. It shows that it performs.

8. 결론8. Conclusion

제안된 적대적 공격은 심층 합성곱 신경망(DCNN) 기반 자율 보행 인식 시스템의 취약점을 악용하는 고무적인 결과를 보여준다. 본 발명은 인간의 보행 분석을 통한 심층 합성곱 신경망(DCNN) 기반 자율 감시 시스템이 정확한 결과를 가진다는 것을 입증하였다. 그러나, 입력된 보행 에너지 이미지(GEI)에 최소한의 교란이 추가되면 매우 좋지 않은 결과가 발생한다. 또한, 감시 시스템이 엄격한 보안 프로토콜로 유지되면 이를 속이기 쉽지 않다. 공격자는 기본 모델이나 기울기 정보, 파라미터 및 구조에 액세스할 수 없다. 이러한 시나리오에서, 심층 합성곱 신경망(DCNN) 기반 감시 시스템의 취약성과 견고성을 악용하기 위해 블랙 박스 적대적 공격을 제안했다. 본 발명에서 공격자는 쿼리 이미지를 제공하고 필요한 결과를 얻기 위해 시스템에 액세스할 수 있다. 모델 구조 및 파라미터에 대한 다른 정보는 필요하지 않는다. 쿼리 이미지 전달로 인한 시스템의 결과 정보는 에이전트에게 다음에 수행할 액션을 알려준다. 공격은 환경에 관여하고 보행 에너지 이미지(GEI)에서 최적의 위치를 결정하는 방법을 학습하는 강화 학습(RL) 기반 지능형 에이전트를 사용하여 수행된다. 이 에이전트의 주요 목표는 다른 액션을 수행하고 추가된 적대적 패치가 클린 이미지를 적으로 변환하는 위치를 찾기 위해 이동함으로써 환경과 상호 작용하는 것이다.The proposed adversarial attack shows encouraging results in exploiting vulnerabilities in a deep convolutional neural network (DCNN)-based autonomous gait recognition system. The present invention has proven that a deep convolutional neural network (DCNN)-based autonomous monitoring system through human gait analysis has accurate results. However, adding minimal disturbance to the input Gait Energy Image (GEI) produces very poor results. Additionally, if the surveillance system is maintained with strict security protocols, it is not easy to spoof it. An attacker cannot access the underlying model or gradient information, parameters, and structure. In this scenario, a black box adversarial attack was proposed to exploit the vulnerabilities and robustness of deep convolutional neural network (DCNN)-based surveillance systems. In the present invention, an attacker can access the system to provide a query image and obtain the required results. No other information about model structure and parameters is required. The system's result information resulting from query image delivery informs the agent of the next action to be taken. The attack is performed using a reinforcement learning (RL)-based intelligent agent that engages with the environment and learns how to determine the optimal location from the Gait Energy Image (GEI). The main goal of this agent is to interact with the environment by performing different actions and moving to find locations where added adversarial patches transform clean images into adversaries.

정리하면, 딥러닝(DL) 기술을 기반으로 한 보행 식별은 최근 감시를 위한 생체 인식 기술로 부상했다. 본 발명은 공격자가 강화 학습(RL)을 사용한 패치 기반 블랙 박스 적대적 공격을 사용하여 기본 모델의 기울기, 구조 등에 액세스할 수 없을 때, 보행 기반 자율 감시 시스템에서 딥러닝 모델의 취약성과 의사 결정 능력을 활용했다. 이러한 자동화된 감시 시스템은 보호되어 공격자의 액세스를 차단한다. 따라서, 공격은 에이전트의 목표가 최적의 이미지 위치를 결정하는 강화 학습(RL) 프레임워크에서 수행될 수 있으며, 랜덤 픽셀로 교란될 때 모델이 잘못 수행되도록 한다. 또한, 본 발명을 통해 제안된 적대적 공격은 고무적인 결과를 나타낸다(최대 성공률=77.59%). 연구원은 감시 응용 프로그램에서 이러한 모델을 사용하기 전에 시스템 복원 시나리오(예컨대, 공격자가 시스템 액세스 권한이 없는 경우 등)를 탐색해야 한다.In summary, gait identification based on deep learning (DL) technology has recently emerged as a biometric technology for surveillance. The present invention uses a patch-based black box adversarial attack using reinforcement learning (RL) to exploit the vulnerability and decision-making ability of deep learning models in a gait-based autonomous surveillance system when the attacker cannot access the gradient, structure, etc. of the base model. I used it. These automated surveillance systems are secured to block attackers' access. Therefore, the attack can be performed in a reinforcement learning (RL) framework where the agent's goal is to determine the optimal image location, causing the model to perform poorly when perturbed by random pixels. Additionally, the adversarial attack proposed through the present invention shows encouraging results (maximum success rate = 77.59%). Researchers should explore system restoration scenarios (e.g., when an attacker does not have system access, etc.) before using these models in surveillance applications.

본 실시예들에 따른 동작은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능한 저장 매체에 기록될 수 있다. 컴퓨터 판독 가능한 저장 매체는 실행을 위해 프로세서에 명령어를 제공하는데 참여한 임의의 매체를 나타낸다. 컴퓨터 판독 가능한 저장 매체는 프로그램 명령, 데이터 파일, 데이터 구조 또는 이들의 조합을 포함할 수 있다. 예컨대, 자기 매체, 광기록 매체, 메모리 등이 있을 수 있다. 컴퓨터 프로그램은 네트워크로 연결된 컴퓨터 시스템 상에 분산되어 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 본 실시예를 구현하기 위한 기능적인(Functional) 프로그램, 코드, 및 코드 세그먼트들은 본 실시예가 속하는 기술 분야의 프로그래머들에 의해 용이하게 추론될 수 있을 것이다.Operations according to the present embodiments may be implemented in the form of program instructions that can be performed through various computer means and recorded on a computer-readable storage medium. A computer-readable storage medium refers to any medium that participates in providing instructions to a processor for execution. A computer-readable storage medium may include program instructions, data files, data structures, or combinations thereof. For example, there may be magnetic media, optical recording media, memory, etc. A computer program may be distributed over networked computer systems so that computer-readable code can be stored and executed in a distributed manner. Functional programs, codes, and code segments for implementing this embodiment can be easily deduced by programmers in the technical field to which this embodiment belongs.

본 실시예들은 본 실시예의 기술 사상을 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.These embodiments are intended to explain the technical idea of the present embodiment, and the scope of the technical idea of the present embodiment is not limited by these examples. The scope of protection of this embodiment should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of this embodiment.

100 : 적대적 공격 장치,
110 : 프로세서,
130 : 컴퓨터 판독 가능한 저장 매체,
131 : 프로그램,
150 : 통신 버스,
170 : 입출력 인터페이스,
190 : 통신 인터페이스100: Hostile attack device,
110: processor,
130: computer-readable storage medium,
131: program,
150: communication bus,
170: input/output interface,
190: communication interface

Claims

Obtaining a deep learning-based gait recognition system that is the subject of analysis; and
The gait energy image (GEI) with which the agent interacts is the environment, the pixel location in the gait energy image (GEI) is the state, and the agent is currently All possible movements in the state are actions, and the action quality value determined based on the prediction result provided from the gait recognition system is a reward. A black box attack is performed using reinforcement learning (RL). performing an adversarial attack on the gait recognition system;
Includes,
The action is: the agent moves to the upper pixel based on the pixel position according to the current state, the agent moves to the lower pixel based on the pixel position according to the current state, and the agent moves to the pixel according to the current state. moving to the pixel to the left based on the position, and moving to the pixel to the right based on the pixel position according to the agent's current state,
The adversarial attack performance step consists of performing the adversarial attack on the gait recognition system based on Q-learning,
In the Q-learning, the Gait Energy Image (GEI) and Q-learning algorithm parameters are input, the target state reached by the agent is the output, and hyperparameters including the number of episodes and the number of repetitions per episode are used as state and action parameters. After initializing to size, for each GEI, based on the number of repetitions per episode and the number of episodes, the agent bases the pixel position in the GEI according to the current state. In the process of performing the action, an adversarial patch of a preset size is generated based on the action performed by the agent, and an adversarial gait energy image (GEI) to which the adversarial patch is added is sent to the gait recognition system. a process of performing a state transition, a process of determining the reward for the action based on a prediction result for the hostile gait energy image (GEI) provided from the gait recognition system, and each state-action pair (state- An algorithm that repeatedly performs the process of updating a Q-table that holds Q-values for action pairs,
An adversarial attack method using reinforcement learning for a gait recognition system.

In paragraph 1:
The gait recognition system is,
An intelligent model based on a deep convolutional neural network (DCNN) that recognizes individuals by walking style using the gait representation based on the gait energy image (GEI),
An adversarial attack method using reinforcement learning for a gait recognition system.

delete

In paragraph 2,
The action is,
The agent moves to another pixel in steps of a preset size based on the pixel position according to the current state,
An adversarial attack method using reinforcement learning for a gait recognition system.

delete

In paragraph 5,
The hostile attack execution step is,
Generating the hostile patch with n×n size in which pixel values are randomly generated centering on the pixel location, which is the state of the agent according to the action performed by the agent, and walking based on the pixel location of the hostile patch. Consisting of adding the adversarial patch to the energy image (GEI) to obtain the adversarial gait energy image (GEI) in which pixels are perturbed.
An adversarial attack method using reinforcement learning for a gait recognition system.

In paragraph 8:
The hostile attack execution step is,
If the prediction result for the hostile gait energy image (GEI) provided from the intelligent model corresponds to a result that lowers the confidence of the intelligent model, determining the positive action quality value as the compensation for the action, If the prediction result for the hostile gait energy image (GEI) provided from the intelligent model does not correspond to a result that lowers the reliability of the intelligent model, determining the negative action quality value as the compensation for the action,
An adversarial attack method using reinforcement learning for a gait recognition system.

Stored in a computer-readable storage medium to execute an adversarial attack method using reinforcement learning on a gait recognition system according to any one of paragraphs 1, 2, 5, 8, and 9 on a computer. computer program.

A memory that stores one or more programs for performing an adversarial attack using reinforcement learning (RL) for a deep learning-based gait recognition system; and
One or more processors that perform an operation to perform the adversarial attack on the gait recognition system using the reinforcement learning according to the one or more programs stored in the memory;
Includes,
The processor,
Acquire the gait recognition system that is the subject of analysis,
The gait energy image (GEI) with which the agent interacts is the environment, the pixel location in the gait energy image (GEI) is the state, and the agent is currently All possible movements in the state are actions, and the action quality value determined based on the prediction result provided from the gait recognition system is the reward. A black box attack using the reinforcement learning (RL). Performing the adversarial attack on the gait recognition system,
The action is: the agent moves to the upper pixel based on the pixel position according to the current state, the agent moves to the lower pixel based on the pixel position according to the current state, and the agent moves to the pixel according to the current state. moving to the pixel to the left based on the position, and moving to the pixel to the right based on the pixel position according to the agent's current state,
The processor performs the adversarial attack on the gait recognition system based on Q-learning,
In the Q-learning, the Gait Energy Image (GEI) and Q-learning algorithm parameters are input, the target state reached by the agent is the output, and hyperparameters including the number of episodes and the number of repetitions per episode are used as state and action parameters. After initializing to size, for each GEI, based on the number of repetitions per episode and the number of episodes, the agent bases the pixel position in the GEI according to the current state. In the process of performing the action, an adversarial patch of a preset size is generated based on the action performed by the agent, and an adversarial gait energy image (GEI) to which the adversarial patch is added is sent to the gait recognition system. a process of performing a state transition, a process of determining the reward for the action based on a prediction result for the hostile gait energy image (GEI) provided from the gait recognition system, and each state-action pair (state- An algorithm that repeatedly performs the process of updating a Q-table that holds Q-values for action pairs,
An adversarial attack device using reinforcement learning for a gait recognition system.

In paragraph 11:
The gait recognition system is,
An intelligent model based on a deep convolutional neural network (DCNN) that recognizes individuals by walking style using the gait representation based on the gait energy image (GEI),
An adversarial attack device using reinforcement learning for a gait recognition system.

delete

In paragraph 12:
The processor,
Generating the hostile patch with n×n size in which pixel values are randomly generated centering on the pixel location, which is the state of the agent according to the action performed by the agent, and walking based on the pixel location of the hostile patch. Obtaining the adversarial gait energy image (GEI) in which pixels are perturbed by adding the adversarial patch to the energy image (GEI),
An adversarial attack device using reinforcement learning for a gait recognition system.

In paragraph 16:
The processor,
If the prediction result for the hostile gait energy image (GEI) provided from the intelligent model corresponds to a result that lowers the confidence of the intelligent model, determining the positive action quality value as the compensation for the action, If the prediction result for the hostile gait energy image (GEI) provided from the intelligent model does not correspond to a result that lowers the reliability of the intelligent model, determining the negative action quality value as the reward for the action,
An adversarial attack device using reinforcement learning for a gait recognition system.