KR20220074715A

KR20220074715A - Method and apparatus for image processing

Info

Publication number: KR20220074715A
Application number: KR1020210130966A
Authority: KR
Inventors: 강동우; 리 웨이밍; 허 바오; 왕 창; 홍성훈; 마 린
Original assignee: 삼성전자주식회사
Priority date: 2020-11-27
Filing date: 2021-10-01
Publication date: 2022-06-03
Also published as: CN114565953A

Abstract

이미지 처리 방법 및 장치가 제공된다. 일 실시예에 따르면, 이미지 처리 방법은 이미지의 특징 맵을 획득하고, 특징 맵의 픽셀 포인트의 공간 위치 가중치를 결정하고, 픽셀 포인트의 공간 위치 가중치를 기반으로 특징 맵을 보정하여 보정된 특징 맵을 획득하고, 보정된 특징 맵에 따라 키 포인트를 결정하는 단계들을 포함한다.An image processing method and apparatus are provided. According to an embodiment, the image processing method obtains a feature map of an image, determines a spatial position weight of a pixel point of the feature map, and corrects the feature map based on the spatial position weight of the pixel point to obtain a corrected feature map obtaining, and determining a key point according to the calibrated feature map.

Description

Image processing method and apparatus

아래 실시예들은 이미지 처리 방법 및 장치에 관한 것이다.The following embodiments relate to an image processing method and apparatus.

컴퓨터 비전 기술은 이미지 인식을 기반으로 객관적인 객체와 장면에 대한 유용한 결정을 내릴 수 있다. 키 포인트 검출은 특징 포인트 또는 관심 포인트 검출 기술이라고도 하며, 시각적 포지셔닝 등과 같은 많은 작업에 적용될 수 있다. 시각적 포지셔닝에서 사람 눈의 동공 포지셔닝 및 추적은 증강 현실(augmented reality, AR)에 이용될 수 있다. 예를 들어, 차량의 헤드 업 디스플레이(head up display, HUD) 장치에서, 사람의 눈에 대한 포지셔닝과 추적을 수행해야 정보를 표시할 윈드쉴드 내 위치가 정해질 수 있다. 종래의 이미지 키 포인트 검출은 일반적으로 형상 제약 방법을 기반으로 한다. 모델은 훈련 이미지 샘플을 통해 훈련 이미지 샘플의 특징 점 분포에 대한 통계 정보를 획득하고, 타겟 이미지에서 대응되는 특징 점의 위치를 찾기 위해 허용되는 특징점의 변화 방향을 트레이닝할 수 있다.Computer vision technology can make useful decisions about objective objects and scenes based on image recognition. Key point detection is also referred to as a feature point or point of interest detection technique, and can be applied to many tasks such as visual positioning. In visual positioning, the pupil positioning and tracking of the human eye may be used in augmented reality (AR). For example, in a head up display (HUD) device of a vehicle, a position within a windshield for displaying information may be determined only when positioning and tracking of the human eye are performed. Conventional image key point detection is generally based on shape constraint methods. The model may acquire statistical information on the feature point distribution of the training image sample through the training image sample, and train the allowed change direction of the feature point to find the position of the feature point in the target image.

일 실시예에 따르면, 이미지 처리 방법은 이미지의 특징 맵을 획득하는 단계, 특징 맵의 픽셀 포인트의 공간 위치 가중치를 결정하는 단계, 픽셀 포인트의 공간 위치 가중치를 기반으로 특징 맵을 보정하여 보정된 특징 맵을 획득하는 단계, 및 보정된 특징 맵에 따라 키 포인트를 결정하는 단계를 포함할 수 있다.According to an embodiment, the image processing method includes obtaining a feature map of an image, determining a spatial position weight of a pixel point of the feature map, and correcting the feature map based on the spatial position weight of the pixel point It may include obtaining a map, and determining a key point according to the calibrated feature map.

다른 일 실시예에 따르면, 이미지 처리 방법은 이미지의 특징 맵의 후보 박스를 획득하는 단계, 미리 설정된 크기의 제1 특징 맵의 좌표 포인트의 후보 박스 내의 특징 맵 상의 투영 포인트와 투영 포인트의 인접 좌표 포인트 간의 상대 거리 정보에 해당하는 제2 상대 거리 정보를 결정하는 단계, 제2 상대 거리 정보 및 후보 박스 내의 특징 맵을 기반으로 제2 보간 계수를 결정하는 단계, 제2 보간 계수 및 후보 박스 내의 특징 맵을 기반으로 보간을 수행하여 제1 특징 맵을 획득하는 단계, 및 제1 특징 맵을 기반으로 처리를 수행하는 단계를 포함할 수 있다.According to another embodiment, the image processing method includes: obtaining a candidate box of a feature map of an image, a projection point on a feature map within a candidate box of a coordinate point of a first feature map of a preset size, and an adjacent coordinate point of the projection point Determining second relative distance information corresponding to the relative distance information between each other, determining a second interpolation coefficient based on the second relative distance information and the feature map in the candidate box, the second interpolation coefficient and the feature map in the candidate box Obtaining a first feature map by performing interpolation based on , and performing processing based on the first feature map.

일 실시예에 따르면, 이미지 처리 장치는 프로세서, 및 프로세서에서 실행가능한 명령어들을 포함하는 메모리를 포함하고, 명령어들이 프로세서에서 실행되면, 프로세서는 이미지의 특징 맵을 획득하고, 특징 맵의 픽셀 포인트의 공간 위치 가중치를 결정하고, 픽셀 포인트의 공간 위치 가중치를 기반으로 특징 맵을 보정하여 보정된 특징 맵을 획득하고, 보정된 특징 맵에 따라 키 포인트를 결정할 수 있다.According to an embodiment, an image processing apparatus includes a processor and a memory including instructions executable in the processor, when the instructions are executed in the processor, the processor obtains a feature map of the image, and a space of pixel points of the feature map A position weight may be determined, and a corrected feature map may be obtained by correcting the feature map based on the spatial position weight of the pixel point, and a key point may be determined according to the corrected feature map.

도 1은 일 실시예에 따른 이미지 처리 방법을 나타내는 플로우 차트이다.
도 2는 일 실시예에 따른 동공 포지셔닝 모델의 이미지 검출 동작을 나타낸다.
도 3은 일 실시예에 따른 동공 포지셔닝 모델 트레이닝 과정을 나타낸다.
도 4는 일 실시예에 따른 보간을 통한 이미지 획득 동작을 나타내는 플로우 차트이다.
도 5는 일 실시예에 따른 상대 거리 정보를 결정하는 동작을 나타낸다.
도 6은 일 실시예에 따른 다운 샘플링 프로세스를 나타낸다.
도 7은 일 실시예에 따른 다운 샘플링 네트워크의 구조를 나타낸다.
도 8은 일 실시예에 따른 간섭물 제거 동작을 나타낸다.
도 9는 일 실시예에 따른 간섭물 제거 모델의 트레이닝 과정을 나타낸다.
도 10은 일 실시예에 따른 추적 실패 검출 모델의 트레이닝 과정을 나타낸다.
도 11은 일 실시예에 따른 연속적인 이미지 프레임들에 관한 추적 동작을 나타낸다.
도 12는 다른 일 실시예에 따른 이미지 처리 방법을 나타내는 플로우 차트이다.
도 13은 일 실시예에 따른 이미지 처리 장치의 구성을 나타내는 블록도이다.
도 14는 일 실시예에 따른 전자 장치의 구성을 나타내는 블록도이다.1 is a flowchart illustrating an image processing method according to an exemplary embodiment.
2 illustrates an image detection operation of a pupil positioning model according to an exemplary embodiment.
3 shows a pupil positioning model training process according to an embodiment.
4 is a flowchart illustrating an image acquisition operation through interpolation according to an exemplary embodiment.
5 illustrates an operation of determining relative distance information according to an embodiment.
6 illustrates a downsampling process according to an embodiment.
7 shows the structure of a down-sampling network according to an embodiment.
8 illustrates an interference removal operation according to an exemplary embodiment.
9 illustrates a training process of an interference removal model according to an exemplary embodiment.
10 illustrates a training process of a tracking failure detection model according to an embodiment.
11 illustrates a tracking operation for successive image frames according to an embodiment.
12 is a flowchart illustrating an image processing method according to another exemplary embodiment.
13 is a block diagram illustrating a configuration of an image processing apparatus according to an exemplary embodiment.
14 is a block diagram illustrating a configuration of an electronic device according to an exemplary embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for purposes of illustration only, and may be changed and implemented in various forms. Accordingly, the actual implementation form is not limited to the specific embodiments disclosed, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical spirit described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Although terms such as first or second may be used to describe various elements, these terms should be interpreted only for the purpose of distinguishing one element from another. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected to” another component, it may be directly connected or connected to the other component, but it should be understood that another component may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, and includes one or more other features or numbers, It should be understood that the possibility of the presence or addition of steps, operations, components, parts or combinations thereof is not precluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present specification. does not

증강 현실(augmented reality, AR)의 구현에 동공 포지셔닝 및 추적이 요구될 수 있다. 예를 들어, 차량의 헤드 업 디스플레이(head up display, HUD) 장치에서, 운전자의 동공에 대한 포지셔닝 및 추적을 수행해야 윈드쉴드에서 정보를 표시할 위치가 결정될 수 있다. 또한, 3차원 디스플레이를 갖는 포터블 디바이스에서는 동공의 3차원 위치에 따라 3차원 아이콘, 3차원 비디오 등과 같은 3차원 정보의 디바이스 내 표시 위치가 결정될 수 있다.Pupil positioning and tracking may be required in the implementation of augmented reality (AR). For example, in a head up display (HUD) device of a vehicle, a position to display information on the windshield may be determined only when positioning and tracking of the driver's pupil are performed. Also, in a portable device having a three-dimensional display, a display position of three-dimensional information such as a three-dimensional icon and a three-dimensional video in the device may be determined according to the three-dimensional position of the pupil.

동공 포지셔닝 시 이미지에 폐색과 같은 간섭물이 존재하는 경우 폐색이 표시된 데이터가 그대로 사용될 수 있다. 이 경우, 모델은 얼굴 키 포인트의 폐색 상황을 직접 추정할 수 있고, 동시에 각 키 포인트의 추정 위치 및 신뢰도 또한 추정할 수 있다. 얼굴은 서로 다른 영역으로 나뉠 수 있고, 각 영역은 한 에지(edge)에 대응할 수 있다. 폐색된 에지의 정보는 에지들 사이의 공간과 특징 간의 관계를 사용하여 추론될 수 있다.When an interference such as an occlusion exists in the image during pupil positioning, the data indicating the occlusion may be used as it is. In this case, the model can directly estimate the occlusion situation of the facial key points, and at the same time estimate the estimated position and reliability of each key point. The face may be divided into different regions, and each region may correspond to one edge. The information of the occluded edge can be inferred using the relationship between features and the space between the edges.

그러나, 현재 많은 수의 데이터 세트는 이러한 표시를 포함하지 않는다. 따라서, 다른 데이터 세트를 기준으로 폐색 레이블링 정보를 사용할 수 없다. 트레이닝 과정에서 반복적인 방법을 사용하여 다른 영역의 폐색 조건을 추론해야하므로 계산 효율성이 떨어질 수 있다.However, a large number of current data sets do not include such an indication. Therefore, occlusion labeling information cannot be used based on other data sets. In the training process, iterative methods must be used to infer occlusion conditions in different areas, which may reduce computational efficiency.

이미지에 간섭물(예: 안경과 같은 폐색물)이 있는 경우, 먼저 이미지에서 간섭물이 제거될 수 있다. 안경을 제거하고 안경을 추가하는 작업이 동시에 처리될 수 있다. 눈 영역과 얼굴 부분은 구분되어 별도로 인코딩될 수 있다. 안경을 착용한 사람의 얼굴과 안경을 착용하지 않는 사람의 눈 영역의 두 코드가 결합되어 네트워크를 통해 안경을 착용하지 않은 얼굴 이미지가 획득될 수 있다.If there are interferences in the image (eg, occlusions such as glasses), the interferences can be removed from the image first. The operation of removing the glasses and adding the glasses can be processed simultaneously. The eye region and the face part may be separately encoded. Two codes of a face of a person wearing glasses and an eye region of a person not wearing glasses are combined to obtain a face image without glasses through a network.

다른 사람의 얼굴과 눈 영역을 사용하여 이미지를 합성하는 것은 새로운 이미지가 더 사실적인지 여부에 더 관련될 수 있고, 눈 영역의 공간적 형상이 크게 변경되었는지 여부는 보장되지 않을 수 있다. 이에 따라 동공의 위치가 변경될 수 있다.Compositing an image using another person's face and eye regions may be more relevant to whether the new image is more realistic, and it may not be guaranteed whether the spatial shape of the eye region has changed significantly. Accordingly, the position of the pupil may be changed.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same components are assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일 실시예에 따른 이미지 처리 방법을 나타내는 플로우 차트이다.1 is a flowchart illustrating an image processing method according to an exemplary embodiment.

도 1에 도시된 것처럼, 이미치 처리 방법에 따르면, 단계(110)에서 이미지의 특징 맵이 획득된다. 이미지는 사람 얼굴 이미지일 수 있다. 이미지는 검출될 비디오로부터 획득될 수 있고, 특징 추출 네트워크를 사용하여 이미지로부터 특징 맵이 추출될 수 있다.As shown in FIG. 1 , according to the image processing method, a feature map of an image is obtained in step 110 . The image may be a human face image. An image may be obtained from the video to be detected, and a feature map may be extracted from the image using a feature extraction network.

단계(120)에서 특징 맵의 픽셀 포인트의 공간 위치 가중치가 결정된다. 공간 위치 가중치는 특징 맵의 픽셀 포인트를 보정하는데 사용될 수 있다. 공간 위치 가중치는 이미지의 키 포인트의 초기 위치 및 특징 맵에 기초하여 결정될 수 있다. 이미지의 키 포인트의 초기 위치가 대략적으로 결정될 수 있고, 초기 위치를 조절하여 제1 가중치가 결정될 수 있고, 특징 맵에 따른 제2 가중치가 결정될 수 있고, 제1 가중치 및 제2 가중치에 따라 공간 위치 가중치가 결정될 수 있다.In step 120, spatial position weights of pixel points of the feature map are determined. Spatial location weights may be used to calibrate pixel points of the feature map. The spatial position weight may be determined based on the initial position of the key point of the image and the feature map. The initial position of the key point of the image may be roughly determined, the first weight may be determined by adjusting the initial position, the second weight may be determined according to the feature map, and the spatial position may be determined according to the first weight and the second weight. A weight may be determined.

이때, 이미지의 특징 추출이 먼저 수행되고, 다음으로 분류가 수행되어, 키 포인트의 초기 위치가 결정될 수 있다. 또한, 키 포인트의 초기 위치는 키 포인트에 각각 대응하는 복수의 벡터를 포함할 수 있다. 각 벡터는 이미지 상의 각 위치에서의 키 포인트의 분포 확률을 나타낼 수 있다. 일 실시예에 따르면, 제1 가중치 및 제2 가중치를 포인트 별로 곱하여 공간 위치 가중치가 도출될 수 있다.In this case, feature extraction of the image is performed first, and classification is performed next, so that an initial position of a key point may be determined. Also, the initial position of the key point may include a plurality of vectors respectively corresponding to the key point. Each vector may represent a distribution probability of a key point at each location on the image. According to an embodiment, the spatial location weight may be derived by multiplying the first weight and the second weight for each point.

공간 위치 가중치를 결정하는 구체적인 과정은 추후 자세히 설명한다.A specific process of determining the spatial location weight will be described in detail later.

단계(130)에서 픽셀 포인트의 공간 위치 가중치를 기반으로 특징 맵을 보정하여 보정된 특징 맵이 획득된다. 단계(140)에서 보정된 특징 맵에 기초하여 키 포인트가 결정될 수 있다. 키 포인트는 눈의 동공의 키 포인트뿐만 아니라, 얼굴 특징 및 얼굴 형태에 대응하는 키 포인트 또한 포함할 수 있다. 보정된 특징 맵을 분류 네트워크에 입력하여 키 포인트 위치가 결정될 수 있다.In step 130, a corrected feature map is obtained by correcting the feature map based on the spatial position weight of the pixel point. In operation 140 , a key point may be determined based on the corrected feature map. The key points may include not only key points of pupils of the eyes, but also key points corresponding to facial features and facial shapes. Key point locations can be determined by inputting the calibrated feature map into the classification network.

이하, 키 포인트의 위치 결정에 대해 설명한다.Hereinafter, the positioning of the key point will be described.

예를 들어, 이미지는 얼굴 이미지일 수 있고, 키 포인트 위치는 동공에 대응하는 키 포인트 위치를 포함할 수 있다. 도 2는 일 실시예에 따른 동공 포지셔닝 모델(200)의 이미지 검출 동작을 나타낸다. 동공 포지셔닝 모델(200)의 다음과 같은 프로세스를 통해 키 포인트 위치가 결정될 수 있다.For example, the image may be a face image, and the key point location may include a key point location corresponding to a pupil. 2 illustrates an image detection operation of the pupil positioning model 200 according to an exemplary embodiment. A key point position may be determined through the following process of the pupil positioning model 200 .

1) 현재 프레임의 이미지(201)가 제1 네트워크(210)의 특징 추출 네트워크(211)에 입력될 수 있다. 이미지(201)의 해상도는 h*w일 수 있다. 예를 들어, 특징 추출 네트워크(211)는 mobilenet v2일 수 있다. mobilenet v2는 잔차 구조(residual structure)를 기반으로 하는 경량급 신경망일 수 있다.1) The image 201 of the current frame may be input to the feature extraction network 211 of the first network 210 . The resolution of the image 201 may be h*w. For example, the feature extraction network 211 may be mobilenet v2. mobilenet v2 may be a lightweight neural network based on a residual structure.

2) 특징 추출 네트워크(211)에서 출력된 특징은 제1 네트워크(210)의 제1 분류 네트워크(212)에 입력될 수 있다. 예를 들어, 제1 분류 네트워크(212)는 전체 연결 레이어를 포함할 수 있고, 키 포인트의 초기 위치(203)를 획득할 수 있다. 초기 위치(203)는 a_k로 나타낼 수 있다. k = 1, 2, ..., K일 수 있고, K는 키 포인트의 개수일 수 있다. 또한, 특징 추출 네트워크(211)의 어느 한 레이어는 특징 맵(202)을 출력할 수 있다. 특징 맵은 F로 나타낼 수 있다.2) The feature output from the feature extraction network 211 may be input to the first classification network 212 of the first network 210 . For example, the first classification network 212 may include the entire connection layer, and obtain the initial position 203 of the key point. The initial position 203 may be represented by a _k . k = 1, 2, ..., K, and K may be the number of key points. Also, any one layer of the feature extraction network 211 may output a feature map 202 . The feature map may be denoted by F.

3) 키 포인트의 초기 위치(203)는 제2 네트워크(220)의 형상 조절 네트워크(221)에 입력될 수 있고, 이에 따라 제1 가중치(204)가 획득될 수 있다. 제1 가중치(204)는 w_struc로 나타낼 수 있다. w_struc의 크기는 h*w*1일 수 있다. 형상 조절 네트워크(221)는 전체 연결 레이어를 포함할 수 있다.3) The initial position 203 of the key point may be input to the shape adjustment network 221 of the second network 220 , and thus the first weight 204 may be obtained. The first weight 204 may be represented by w _struc . The size of w _struc may be h*w*1. The shape control network 221 may include the entire connection layer.

4) 특징 맵(202)은 제2 네트워크(220)의 제2 분류 네트워크(222)에 입력될 수 있고, 이에 따라 제2 가중치(205)가 획득될 수 있다. 제2 가중치(205)는 w_appear로 나타낼 수 있다. 제2 분류 네트워크(222)는 컨볼루션 레이어를 포함할 수 있다. w_appear의 크기는 h*w*1일 수 있다.4) The feature map 202 may be input to the second classification network 222 of the second network 220 , and thus the second weight 205 may be obtained. The second weight 205 may be expressed as w _appear . The second classification network 222 may include a convolutional layer. The size of w _appear may be h*w*1.

5) W_struc및 w_appear를 기반으로 공간 위치 가중치가 결정될 수 있다. 공간 위치 가중치는

를 통해 결정될 수 있다. w는 공간 위치 가중치를,

는 포인트 별(pointwise) 곱셈을 나타낼 수 있다.5) A spatial location weight may be determined based on W _struc and w _appear . spatial location weights

can be determined through w is the spatial position weight,

may represent pointwise multiplication.

6) 공간 위치 가중치(w) 및 특징 맵(202) 기반의 보정 작업을 통해 보정된 특징 맵(206)이 생성될 수 있다. 보정된 특징 맵(206)은 F'으로 나타낼 수 있고,

를 통해 생성될 수 있다.6) A corrected feature map 206 may be generated through a correction operation based on the spatial location weight w and the feature map 202 . The calibrated feature map 206 may be denoted by F',

can be created through

7) 보정된 특징 맵(206)은 제3 분류 네트워크(230)에 입력될 수 있다. 제3 분류 네트워크(230)는 전체 연결 레이어를 포함할 수 있다. 제3 분류 네트워크(230)를 통해 키 포인트 위치(207)가 획득될 수 있다.7) The corrected feature map 206 may be input to the third classification network 230 . The third classification network 230 may include the entire connectivity layer. The key point location 207 may be obtained via the third classification network 230 .

위와 같은 실시예를 통해 이미지(201)의 특징 맵(202)으로부터 픽셀 포인트의 공간 위치 가중치를 결정하고, 픽셀 포인트의 공간 위치 가중치를 기반으로 특징 맵(202)을 보정하여, 보정된 특징 맵(206)이 획득될 수 있고, 보정된 특징 맵(206)으로부터 키 포인트 위치(207)를 검출하여 키 포인트 위치 검출의 정확도가 향상될 수 있다. 여기서, 특징 맵(202)에 따라 분류 네트워크 또는 회귀 네트워크를 사용하여 예측 키 포인트를 얻을 수 있다.Through the above embodiment, the spatial position weight of the pixel point is determined from the feature map 202 of the image 201, and the feature map 202 is corrected based on the spatial position weight of the pixel point, and the corrected feature map ( 206 may be obtained, and the accuracy of key point position detection may be improved by detecting the key point position 207 from the corrected feature map 206 . Here, according to the feature map 202, a prediction key point may be obtained using a classification network or a regression network.

이하, 동공 포지셔닝 모델(200)의 구조에 대해 설명한다.Hereinafter, the structure of the pupil positioning model 200 will be described.

도 2에 도시된 것처럼, 동공 포지셔닝 모델(200)은 두 부분으로 나눌 수 있다. 제1 네트워크(210)는 특징 추출 네트워크(211) 및 제1 분류 네트워크(212)를 포함할 수 있다. 현재 프레임의 이미지(201)를 제1 네트워크(210)에 입력하여 특징 맵(202) 및 키 포인트의 초기 위치(203)가 획득될 수 있다. 제2 네트워크(220)는 형상 조절 네트워크(221) 및 제2 분류 네트워크(222)를 포함할 수 있다. 키 포인트의 초기 위치(203)를 형상 조절 네트워크(221)에 입력하여 제1 가중치(204)가 획득될 수 있고, 특징 맵(202)을 제2 분류 네트워크(222)에 입력하여 제2 가중치(205)가 획득될 수 있다. 제1 가중치(204) 및 제2 가중치(205)에 기초하여 공간 위치 가중치가 결정될 수 있고, 공간 위치 가중치 및 특징 맵(202)에 기초하여 보정된 특징 맵(206)이 결정될 수 있고, 보정된 특징 맵(206)에 기초하여 키 포인트 위치(207)가 획득될 수 있다.As shown in FIG. 2 , the pupil positioning model 200 can be divided into two parts. The first network 210 may include a feature extraction network 211 and a first classification network 212 . By inputting the image 201 of the current frame into the first network 210 , the feature map 202 and the initial position 203 of the key point may be obtained. The second network 220 may include a shape adjustment network 221 and a second classification network 222 . A first weight 204 may be obtained by inputting the initial position 203 of the key point into the shape adjustment network 221 , and the feature map 202 may be input into a second classification network 222 to obtain a second weight ( 205) can be obtained. A spatial location weight may be determined based on the first weight 204 and the second weight 205 , and a corrected feature map 206 may be determined based on the spatial location weight and the feature map 202 , and the corrected A key point location 207 may be obtained based on the feature map 206 .

이하, 동공 포지셔닝 모델(200)의 트레이닝 과정에 대해 설명한다.Hereinafter, a training process of the pupil positioning model 200 will be described.

도 3에 도시한 것처럼, 동공 포지셔닝 모델(300)을 트레이닝하기 위해 세 가지 손실 함수가 사용될 수 있다.As shown in FIG. 3 , three loss functions may be used to train the pupil positioning model 300 .

제1 손실 함수(loss 1)는 키 포인트의 초기 위치(a_k)와 실제 키 포인트 위치 간의 차이에 따른 손실을 나타낼 수 있다. 실제 키 포인트 위치는 ground truth 키 포인트 위치라고 할 수 있다. 제2 손실 함수(loss 2)는 키 포인트 위치와 실제 키 포인트 위치 간의 차이에 따른 손실을 나타낼 수 있다. 제2 손실 함수(loss 2)는 제1 손실 함수(loss 2)와 같거나 다르게 정의될 수 있다. 제1 손실 함수(loss 1) 및 제2 손실 함수(loss 2)는 smooth L1 및 L2와 같은 다양한 유형의 손실 함수일 수 있다.The first loss function (loss 1) may represent a loss according to a difference between the initial position (a _k ) of the key point and the actual key point position. The actual key point location is the ground truth key point location. The second loss function (loss 2) may represent a loss according to a difference between the key point position and the actual key point position. The second loss function (loss 2) may be defined to be the same as or different from the first loss function (loss 2). The first loss function (loss 1) and the second loss function (loss 2) may be various types of loss functions, such as smooth L1 and L2.

정확한 포인트 예측에 대해 더 높은 가중치를 얻기 위해, 수학식 1과 같은 제3 손실 함수(loss 3)가 정의될 수 있다.In order to obtain a higher weight for accurate point prediction, a third loss function (loss 3) as in Equation 1 may be defined.

L₃는 제3 손실 함수(loss 3), w_struc은 제1 가중치를 나타낼 수 있다. e는 아래에서 설명한다. 동공 포지셔닝 모델(300)의 제1 네트워크(310)의 특징 추출 네트워크(311) 및 제1 분류 네트워크(312)에 의해 키 포인트의 초기 위치(a_k)가 예측되고,

는 실제 키 포인트 위치(ground truth)라면,

가 계산될 수 있다. k = 1, 2, ..., K일 수 있고, K는 키 포인트의 개수일 수 있다. h*w*1의 맵이 0으로 초기화될 수 있고, 각 c_k 값이 맵에 투영될 수 있다. 이 투영은 특징 맵(F) 상의 예측된 키 포인트의 위치를 기반으로 할 수 있고, 두 값이 동일한 위치에 투영될 수 있다. 새로운 투영 값이 이전 값보다 크면, 원래 값은 새로운 값으로 대체될 수 있고, 그렇지 않으면 기존의 상태가 유지될 수 있다. 이렇게 얻은 수치가 e에 해당할 수 있다.L ₃ may represent a third loss function (loss 3), and w _struc may represent a first weight. e is described below. The initial position (a _k ) of the key point is predicted by the feature extraction network 311 and the first classification network 312 of the first network 310 of the pupil positioning model 300 ,

is the actual key point location (ground truth), then

can be calculated. k = 1, 2, ..., K, and K may be the number of key points. A map of h*w*1 may be initialized to zero, and each c _k value may be projected onto the map. This projection may be based on the location of the predicted key point on the feature map F, and both values may be projected to the same location. If the new projection value is greater than the previous value, the original value may be replaced with the new value, otherwise the existing state may be maintained. The figure thus obtained may correspond to e.

동공 포지셔닝 모델(300)의 제2 네트워크(320)는 형상 조절 네트워크(321) 및 제2 분류 네트워크(322)를 포함할 수 있고, 키 포인트의 공간 위치 가중치(w)를 다시 계산하는데 사용될 수 있다. 제1 가중치(w_struc)와 제2 가중치(w_appear) 간의 연산(323)을 통해 공간 위치 가중치(w)가 결정될 수 있다. 이런 식의 반복 계산은 동공 포지셔닝 모델(300)의 제2 네트워크(320)를 한 번 또는 여러 번 반복하는 것과 같을 수 있다. 공간 위치 가중치(w)와 특징 맵(F) 간의 연산(324)을 통해 보정된 특징 맵(F')이 결정되면, 제3 분류 네트워크(330)는 보정된 특징 맵(F')으로부터 키 포인트 위치(302)를 추출할 수 있다. 연산들(323, 324)은 각각 포인트 별(pointwise) 곱셈에 해당할 수 있다.The second network 320 of the pupil positioning model 300 may include a shape adjustment network 321 and a second classification network 322 , and may be used to recalculate the spatial position weight w of the key point. . The spatial location weight w may be determined through an operation 323 between the first weight w _struc and the second weight w _appear . This iterative calculation may be equivalent to repeating the second network 320 of the pupil positioning model 300 once or several times. When the corrected feature map F' is determined through the operation 324 between the spatial location weight w and the feature map F, the third classification network 330 receives the key points from the corrected feature map F'. The location 302 may be extracted. The operations 323 and 324 may each correspond to pointwise multiplication.

이상, 키 포인트 위치의 검출 과정, 동공 포지셔닝 모델(300)의 구조 및 트레이닝 방법을 설명했다. 이하, 보간(interpolation)을 통해 이미지를 획득하는 프로세스에 대해 설명한다.In the above, the detection process of the key point position, the structure of the pupil positioning model 300 and the training method have been described. Hereinafter, a process for acquiring an image through interpolation will be described.

실시예들에 따르면, 도 4에 도시된 것처럼, 단계(110)에서 이미지의 특징 맵이 획득되기 전에, 아래의 단계가 더 수행될 수 있다.According to embodiments, as shown in FIG. 4 , before the feature map of the image is obtained in step 110 , the following steps may be further performed.

단계(410)에서 제1 상대 거리 정보 및 제1 이미지를 기반으로 제1 보간 계수가 결정될 수 있다. 제1 상대 거리 정보는 이미지의 픽셀 포인트의 제1 이미지 상의 투영 포인트와 투영 포인트의 인접 픽셀 포인트 간의 상대 거리 정보일 수 있다.In operation 410 , a first interpolation coefficient may be determined based on the first relative distance information and the first image. The first relative distance information may be relative distance information between a projection point on the first image of a pixel point of the image and an adjacent pixel point of the projection point.

제1 이미지의 해상도는 H*W일 수 있고, 해상도 감소를 통해 이미지는 h*w의 해상도를 가질 수 있다. H, W, h, 및 w는 자연수일 수 있다.The resolution of the first image may be H*W, and through resolution reduction, the image may have a resolution of h*w. H, W, h, and w may be natural numbers.

일 실시예에 따르면, 단계(410)는 아래의 단계를 포함할 수 있다.According to one embodiment, step 410 may include the following steps.

1) 이미지의 어느 하나의 픽셀 포인트에 대하여, 제1 이미지 상의 픽셀 포인트의 투영 포인트를 결정하는 단계1) for any one pixel point of the image, determining the projection point of the pixel point on the first image;

또한, 제1 이미지 상의 픽셀 포인트의 투영 포인트를 결정하는 단계는 아래의 단계를 포함할 수 있다.Also, determining the projection point of the pixel point on the first image may include the following steps.

a. 제1 이미지의 초기 해상도를 결정하고, 이미지의 목표 해상도를 획득하는 단계a. determining an initial resolution of the first image, and obtaining a target resolution of the image;

b. 목표 해상도와 초기 해상도를 기반으로, 이미지 상의 픽셀 포인트를 제1 이미지로 투영하여, 제1 이미지 상의 픽셀 포인트의 투영 포인트를 획득하는 단계b. Projecting the pixel points on the image to the first image based on the target resolution and the initial resolution to obtain a projection point of the pixel points on the first image

제1 이미지의 해상도는 H*W일 수 있고, 이미지는 해상도 감소를 통해 h*w의 해상도를 가질 수 있고, 아래의 수학식 2에 따라 투영 포인트의 좌표가 결정될 수 있다.The resolution of the first image may be H*W, the image may have a resolution of h*w through resolution reduction, and the coordinates of the projection point may be determined according to Equation 2 below.

는 투영 포인트의 좌표,

는 제1 이미지 상의 픽셀 포인트(P')의 좌표, H*W는 제1 이미지의 해상도, h*w는 이미지의 해상도를 나타낼 수 있다.

is the coordinates of the projection point,

may represent the coordinates of the pixel point P' on the first image, H*W may represent the resolution of the first image, and h*w may represent the resolution of the image.

2) 제1 이미지 상의 투영 포인트의 인접 픽셀 포인트를 획득하고, 인접 픽셀 포인트와 투영 포인트 간의 제1 상대 거리 정보를 결정하는 단계2) obtaining an adjacent pixel point of the projection point on the first image, and determining first relative distance information between the adjacent pixel point and the projection point;

예를 들어, 도 5에 도시된 것처럼, 투영 포인트(501)는 투영 포인트(501)의 4개의 인접 픽셀 포인트들(503)에 의해 형성된 직사각형의 격자(502) 내에 위치할 수 있다. 투영 포인트(501)와 격자(502)의 네 에지들 간의 상대 거리(d0, d1, d2, d3)를 통해 제1 상대 거리 정보가 결정될 수 있다.For example, as shown in FIG. 5 , a projection point 501 may be located within a rectangular grid 502 formed by four adjacent pixel points 503 of the projection point 501 . The first relative distance information may be determined through the relative distances d0, d1, d2, and d3 between the projection point 501 and the four edges of the grating 502 .

3) 제1 상대 거리 정보 및 제1 이미지를 기반으로 제1 보간 계수를 결정하는 단계3) determining a first interpolation coefficient based on the first relative distance information and the first image

예를 들어, 쌍선형 보간(bilinear interpolation)과 유사하게, 이미지 상의 픽셀 포인트를 제1 이미지 상에 투영하고, 제1 이미지 상의 인접 픽셀 포인트에 따라 제1 보간 계수가 획득될 수 있다.For example, similar to bilinear interpolation, a pixel point on an image may be projected onto a first image, and a first interpolation coefficient may be obtained according to an adjacent pixel point on the first image.

단계(410)의 제1 상대 거리 정보 및 제1 이미지를 기반으로 제1 보간 계수를 결정하는 단계는, 아래의 단계를 포함할 수 있다.The determining of the first interpolation coefficient based on the first relative distance information and the first image of step 410 may include the following steps.

a. 제1 이미지의 특징을 추출하는 단계a. extracting features of the first image

b. 특징과 상대 거리 정보를 스플라이싱하여 제1 스플라이싱 특징을 획득하는 단계b. Splicing the feature and the relative distance information to obtain a first splicing feature

c. 제1 스플라이싱 특징에 기초한 컨볼루션을 수행하여 제1 보간 계수를 획득하는 단계c. performing convolution based on the first splicing feature to obtain first interpolation coefficients;

단계(420)에서 제1 보간 계수 및 제1 이미지의 픽셀 포인트를 기반으로 보간을 수행하여 이미지가 획득될 수 있다.In operation 420 , an image may be obtained by performing interpolation based on the first interpolation coefficient and the pixel point of the first image.

예를 들어, 쌍선형 보간과 유사하게, 이미지의 픽셀 포인트를 제1 이미지 상에 투영하고, 제1 이미지 상의 인접 내의 여러 픽셀 포인트에 기초한 보간을 수행하여 이미지 상의 픽셀 포인트의 값이 획득될 수 있고, 이를 통해 이미지가 생성될 수 있다.For example, similar to bilinear interpolation, a value of a pixel point on an image can be obtained by projecting a pixel point of an image onto a first image, and performing interpolation based on several pixel points within proximity on the first image, and , through which an image can be created.

도 6에 도시된 것과 같은, 제1 이미지를 기반으로 이미지를 획득하는 과정은 다운 샘플링 프로세스라고 할 수 있다. 다운 샘플링 프로세스는 다운 샘플링 네트워크(600)를 기반으로 구현될 수 있다. 해상도가 H*W인 제1 이미지(601)가 주어지면, 이는 해상도가 h*w인 이미지(604)로 축소될 수 있다. 이러한 프로세스는 쌍선형 보간과 유사할 수 있다. 이미지(604)의 픽셀 포인트를 제1 이미지(601) 상에 투영하고, 제1 이미지(601) 상의 인접 픽셀 포인트를 이용한 보간을 통해, 이미지(604)의 픽셀 포인트의 값이 획득될 수 있다. 제1 보간 계수(603)는 네트워크(610)를 통해 얻을 수 있다. 네트워크(610)는 컨볼루션 네트워크에 해당할 수 있다. 네트워크(610)는 두 개의 입력을 가질 수 있다. 하나는 제1 이미지(601)에 있는 각 픽셀 포인트의 픽셀 값일 수 있고, 다른 하나는 제1 상대 거리 정보(602)일 수 있다. 컨볼루션 후 제1 이미지(601)는 제1 상대 거리 정보(602)와 스플라이싱될 수 있고, 그 다음 컨볼루션을 통해 제1 보간 계수(603)가 획득될 수 있다.As illustrated in FIG. 6 , a process of acquiring an image based on the first image may be referred to as a downsampling process. The down-sampling process may be implemented based on the down-sampling network 600 . Given a first image 601 of resolution H*W, it can be reduced to an image 604 of resolution h*w. This process may be similar to bilinear interpolation. By projecting the pixel points of the image 604 onto the first image 601 , and interpolating using adjacent pixel points on the first image 601 , the values of the pixel points of the image 604 may be obtained. The first interpolation coefficient 603 may be obtained through the network 610 . The network 610 may correspond to a convolutional network. Network 610 may have two inputs. One may be the pixel value of each pixel point in the first image 601 , and the other may be the first relative distance information 602 . After convolution, the first image 601 may be spliced with first relative distance information 602 , and then a first interpolation coefficient 603 may be obtained through convolution.

보간은 아래 수학식 3에 따라 수행될 수 있다.Interpolation may be performed according to Equation 3 below.

I'은 이미지(604), I는 제1 이미지(601), α_i는 제i 픽셀 포인트의 제1 보간 계수(603)일 수 있다. α_i≥0이고,

일 수 있다. 각 픽셀 포인트의 제1 보간 계수(603)는 모두 0보다 크거나 같을 수 있고, 픽셀 포인트들의 제1 보간 계수(603)의 합은 1일 수 있다. 제1 이미지(601)의 픽셀 포인트 및 제1 보간 계수(603)에 따라 보간을 수행하면 이미지(604) 상의 대응 픽셀 포인트가 획득될 수 있다.I′ may be the image 604 , I may be the first image 601 , and α _i may be the first interpolation coefficient 603 of the ith pixel point. α _i ≥ 0,

can be The first interpolation coefficients 603 of each pixel point may all be greater than or equal to 0, and the sum of the first interpolation coefficients 603 of the pixel points may be 1. If interpolation is performed according to the pixel point of the first image 601 and the first interpolation coefficient 603 , the corresponding pixel point on the image 604 may be obtained.

이하, 위의 프로세스에 대해 추가적으로 설명한다. 도 7에 도시된 것처럼, 다운 샘플링 네트워크(700)에는 K개의 컨볼루션 레이어(711), sigmoid 레이어(712), mypool 레이어(스플라이싱 레이어)(713), 컨볼루션 레이어(714), sigmoid 레이어(715), 및 mycomb 레이어(융합 레이어)(716)가 구성될 수 있다. mypool 레이어(713)는 제1 이미지(701)에 관한 K개의 컨볼루션 레이어(711)의 출력 및 제1 상대 거리 정보(702)를 스플라이싱하여 h*w 크기의 특징 맵을 생성할 수 있고, 컨볼루션 레이어(714)는 이 특징 맵에 기초한 컨볼루션 연산을 통해 h*w*4 크기의 데이터 블록을 생성할 수 있다. 이 데이터 블록은 4개의 인접한 픽셀 포인트에 따른 제1 보간 계수를 나타낼 수 있다. mycomb 레이어(716)는 4개의 제1 보간 계수 및 이에 대응하는 4개의 인접한 픽셀 값에 기초하여 가중치 합산을 수행하여 이미지(703)의 최종 픽셀 값을 얻을 수 있다.Hereinafter, the above process will be further described. As shown in FIG. 7 , the down-sampling network 700 includes K convolutional layers 711 , a sigmoid layer 712 , a mypool layer (splicing layer) 713 , a convolutional layer 714 , and a sigmoid layer. 715 , and a mycomb layer (fusion layer) 716 may be configured. The mypool layer 713 splices the outputs of the K convolutional layers 711 for the first image 701 and the first relative distance information 702 to generate a feature map of size h*w and , the convolution layer 714 may generate a data block having a size of h*w*4 through a convolution operation based on the feature map. This data block may represent a first interpolation coefficient along four adjacent pixel points. The mycomb layer 716 may obtain a final pixel value of the image 703 by performing weighted summation based on four first interpolation coefficients and four adjacent pixel values corresponding thereto.

여기서, 제1 이미지(701)의 다른 채널의 제1 보간 계수를 구분하지 않고, 제1 이미지(701)의 각 채널에 대한 제1 보간 계수가 각각 다른 방식으로 별도로 추론될 수 있다. 또한, 제1 이미지(701)에 대해 복수의 분기의 컨볼루션을 수행하여, 복수의 수용 필드 (receptive field) 의 특징 맵을 얻을 수 있고, 이러한 특징 맵을 결합하여 제1 보간 계수를 얻는 것도 가능하다.Here, the first interpolation coefficients for each channel of the first image 701 may be separately inferred in different ways without distinguishing the first interpolation coefficients of other channels of the first image 701 . In addition, by performing convolution of a plurality of branches on the first image 701, a feature map of a plurality of receptive fields can be obtained, and it is also possible to obtain a first interpolation coefficient by combining these feature maps do.

상술한 실시예에서, 다운 샘플링 네트워크(700)는 제1 상대 거리 정보(702) 및 특징 맵을 기반으로 스플라이싱을 수행하여 제1 보간 계수를 생성할 수 있다. 다른 실시 예에서, 먼저 제1 상대 거리 정보(702)에 대해 다양한 변형을 수행하고(예, 제곱 계산 등), 변형된 제1 상대 거리 정보 및 특징 맵을 기반으로 제1 보간 계수를 계산할 수도 있다. 또한, 특징과 제1 상대 거리 정보의 조합 방식은 mypool 레이어(713)의 스플라이싱 방식에 국한되지 않으며, 다른 레이어에서 스플라이싱될 수도 있고, 스플라이싱이 아닌 다른 조합 방식이 수행될 수도 있다.In the above-described embodiment, the down-sampling network 700 may perform splicing based on the first relative distance information 702 and the feature map to generate a first interpolation coefficient. In another embodiment, various transformations may be first performed on the first relative distance information 702 (eg, calculating a square, etc.), and a first interpolation coefficient may be calculated based on the transformed first relative distance information and the feature map. . In addition, the combination method of the feature and the first relative distance information is not limited to the splicing method of the mypool layer 713, it may be spliced in another layer, or a combination method other than splicing may be performed. have.

상술한 실시예에서, 제1 이미지(701)를 처리할 때, 제1 이미지(701)의 이미지 특징 정보 및 투영 포인트의 제1 상대 거리 정보(702)가 입력으로 조합될 수 있다. 이 경우, 투영 포인트의 위치의 영향이 입력에 더 잘 반영될 수 있고, 이를 통해 획득된 이미지(703)는 목표 해상도를 가질 수 있고, 제1 이미지(701)의 이미지 특징 또한 유지될 수 있다.In the above-described embodiment, when processing the first image 701 , the image characteristic information of the first image 701 and the first relative distance information 702 of the projection point may be combined as inputs. In this case, the influence of the position of the projection point may be better reflected in the input, the image 703 obtained through this may have a target resolution, and the image characteristics of the first image 701 may also be maintained.

이상, 보간을 통해 이미지를 획득하는 과정을 설명했다. 이하, 눈 영역 간섭물 제거를 통해 제1 이미지를 획득하는 과정을 설명한다.Above, the process of acquiring an image through interpolation has been described. Hereinafter, a process of acquiring the first image through eye region interference removal will be described.

일 실시예에 따르면, 단계(410)에서 제1 상대 거리 정보 및 제1 이미지를 기반으로 제1 보간 계수가 결정되기 전에, 아래 단계가 더 수행될 수 있다.According to an embodiment, before the first interpolation coefficient is determined based on the first relative distance information and the first image in step 410 , the following steps may be further performed.

1) 제2 이미지로부터 눈 영역 이미지 블록을 크롭(crop)하여 눈 영역을 포함하지 않은 이미지를 획득하는 단계(눈 영역 이미지 블록은 간섭물을 포함함)1) Cropping the eye region image block from the second image to obtain an image that does not include the eye region (the eye region image block includes an interferer)

2) 제2 이미지에 기초하여 동공 가중치 맵을 결정하는 단계2) determining a pupil weight map based on the second image

3) 동공 가중치 맵 및 눈 영역 이미지 블록에 기초하여 간섭물이 제거된 눈 영역 이미지 블록을 획득하는 단계3) obtaining an eye region image block from which the interference has been removed based on the pupil weight map and the eye region image block;

4) 간섭물이 제거된 눈 영역 이미지 블록과 눈 영역을 포함하지 않은 이미지를 스플라이싱하여, 제1 이미지 또는 이미지를 획득하는 단계4) splicing the eye region image block from which the interference has been removed and the image not including the eye region to obtain a first image or image

상술한 실시예에서, 제2 이미지에서 간섭물을 제거함으로써 제1 이미지가 획득된 뒤, 제1 이미지로부터 이미지가 획득될 수 있다. 이와 달리, 제2 이미지로부터 간섭물을 제거하여 이미지가 직접 획득될 수도 있다.In the above-described embodiment, after the first image is obtained by removing the interference from the second image, the image may be obtained from the first image. Alternatively, the image may be obtained directly by removing the interference from the second image.

간섭물은 눈 영역에 위치하는 눈 이외의 간섭 요소일 수 있다. 예를 들어, 눈 영역 간섭물은 안경을 포함할 수 있다.The interferer may be an interfering element other than the eye located in the eye region. For example, an eye region interferer may include glasses.

도 8에 도시된 것처럼, 원본 이미지(801)의 눈 영역 간섭물이 안경인 것을 예로 들어, 간섭물 제거 모델을 사용하여 간섭물이 제거될 수 있다. 원본 이미지(801)는 제2 이미지로 지칭될 수 있다. 간섭물 제거 모델은 단계(810)에서 원본 이미지(801) 내 눈 영역을 검출하고, 간섭물을 포함하는 눈 영역 이미지 블록을 결정할 수 있다. 간섭물 제거 모델은 단계(830)에서 원본 이미지(801)에서 간섭물을 포함하는 눈 영역 이미지 블록을 크롭하여 눈 영역이 없는 이미지(802)를 얻을 수 있다.As shown in FIG. 8 , for example, that the eye region interference of the original image 801 is glasses, the interference may be removed using an interference removal model. The original image 801 may be referred to as a second image. The interference removal model may detect an eye region in the original image 801 in operation 810 and determine an eye region image block including the interference. The interference removal model may obtain an image 802 without an eye region by cropping the eye region image block including the interference from the original image 801 in operation 830 .

간섭물 제거 모델은 단계(820)에서 원본 이미지(801) 내의 동공을 대략적으로 포지셔닝하여 동공 영역을 결정할 수 있다. 여기서 얻은 동공 영역의 정확도는 키 포인트 위치에 따른 동공 위치만큼 정확하지 않을 수 있고, 동공의 대략적인 위치를 특정할 수 있다. 간섭물 제거 모델은 눈 영역 이미지 블록 및 동공 영역에 기초하여 컨볼루션 레이어를 실행하여 눈 영역 중 동공 영역에 높은 가중치가 부여된 동공 가중치 맵(803)을 생성할 수 있다. 예를 들어, 동공 가중치 맵(803)은 동공 중심 주변에서 최대 값을 갖는 가우스 분포 함수에 기초할 수 있다. 간섭물 제거 모델은 동공 가중치 맵(803) 및 제1 이미지(802)에 기초하여, 간섭이 제거된 제1 이미지(804)를 획득할 수 있다.The interference removal model may determine the pupil region by roughly positioning the pupil in the original image 801 in operation 820 . The accuracy of the pupil region obtained here may not be as accurate as the pupil position according to the key point position, and the approximate position of the pupil may be specified. The interference removal model may generate a pupil weight map 803 in which a high weight is given to a pupil region among the eye regions by executing a convolution layer based on the eye region image block and the pupil region. For example, the pupil weight map 803 may be based on a Gaussian distribution function having a maximum value around the pupil center. The interference removal model may acquire the first image 804 from which the interference is removed based on the pupil weight map 803 and the first image 802 .

이하, 간섭물 제거 모델의 트레이닝 과정에 대해 설명한다.Hereinafter, a training process of the interference removal model will be described.

도 9에 도시된 것처럼, 간섭물 제거 모델(910)은 검증 모델(920)을 통해 트레이닝될 수 있다. 검증 모델(920)은 간섭물 제거 모델(910)에 의해 생성된 이미지(902)가 나안(naked face)에 속하는지 판단할 수 있다. 손실 함수의 손실은 간섭물 제거 모델(910)의 손실 및 검증 모델(920)의 손실을 포함할 수 있다. 간섭물 제거 모델(910)과 관련하여 동공 포지셔닝 손실(911) 및 에지 매칭 손실(912)이 사용될 수 있다. 이미지(902)가 동공 포지셔닝 모델(930)에 제공됨에 따른 동공 포지셔닝 모델(930)의 동공 포지셔닝 손실(911)이 간섭물 제거 모델(910)의 손실로 사용될 수 있다. 동공 포지셔닝 모델(930) 대신 다른 동공 포지셔닝 방법을 수행하여 동공 포지셔닝 손실(911)을 얻는 것도 가능하다.As shown in FIG. 9 , the interference removal model 910 may be trained through the verification model 920 . The verification model 920 may determine whether the image 902 generated by the interference removal model 910 belongs to a naked face. The loss of the loss function may include a loss of the interference cancellation model 910 and a loss of the verification model 920 . A pupil positioning loss 911 and an edge matching loss 912 may be used in conjunction with the interferer cancellation model 910 . As the image 902 is provided to the pupil positioning model 930 , the pupil positioning loss 911 of the pupil positioning model 930 may be used as the loss of the interference removal model 910 . It is also possible to obtain the pupil positioning loss 911 by performing another pupil positioning method instead of the pupil positioning model 930 .

간섭물 제거 모델(910)에 의해 원본 이미지(901)로부터 간섭물이 제거되어 제거 결과에 해당하는 결과 이미지(902)가 생성되면, 원본 이미지(901) 및 결과 이미지(902) 각각에 관한 눈 검출(941, 942)을 통해 눈 영역이 검출될 수 있고, 눈 영역 각각에 대한 에지 검출(943, 944)이 수행되어 두 개의 에지 이미지가 생성될 수 있다. 이때, 에지 검출 결과에 대해 가우스 평활화(Gaussian smoothing)가 적용될 수 있다. 두 개의 에지 이미지에 따른 손실(예: L1 손실 및/또는 L2 손실)이 계산될 수 있고, 계산된 손실이 에지 매칭 손실(912)로 간주될 수 있다. 손실은 경사 역 전파(gradient back propagation)에 사용될 수 있다. 여기서 에지 매칭 손실(912)을 계산할 때, 이미지(902)의 에지만 고려하여 노이즈의 영향이 제거될 수 있다.When the interference object is removed from the original image 901 by the interference removal model 910 and a result image 902 corresponding to the removal result is generated, eye detection for each of the original image 901 and the result image 902 An eye region may be detected through 941 and 942 , and edge detection 943 and 944 for each of the eye regions may be performed to generate two edge images. In this case, Gaussian smoothing may be applied to the edge detection result. A loss (eg, an L1 loss and/or an L2 loss) according to the two edge images may be calculated, and the calculated loss may be considered an edge matching loss 912 . The loss can be used for gradient back propagation. Here, when calculating the edge matching loss 912 , only the edge of the image 902 may be considered and the influence of noise may be removed.

검증 모델(920)의 손실은 결과 이미지(902)와 실제 나안 이미지(903) 간의 차이를 나타낼 수 있다. 검증 모델(920)의 손실은 교차 엔트로피(entropy) 손실을 통해 정의될 수 있다. 예를 들어, Patch-GAN(patch- generative adversarial networks)이 사용될 수 있다. 에지 매칭 손실(912)은 눈 영역의 손실을 반영할 수 있고, 검증 모델(920)의 손실은 전체 이미지 영역(예: 이미지(902))의 손실을 반영할 수 있다.The loss of the verification model 920 may represent a difference between the resulting image 902 and the actual naked eye image 903 . The loss of the verification model 920 may be defined through a cross entropy loss. For example, patch-generative adversarial networks (Patch-GAN) can be used. The edge matching loss 912 may reflect the loss of the eye region, and the loss of the verification model 920 may reflect the loss of the entire image region (eg, the image 902 ).

상술된 실시예에서, 에지 매칭 방법의 채택에 따라 눈과 동공의 위치가 변하지 않고 잘 유지될 수 있고, 동공의 키 포인트 위치가 보다 정확하게 계산될 수 있다.In the above-described embodiment, according to the adoption of the edge matching method, the positions of the eyes and the pupils can be well maintained without changing, and the key point positions of the pupils can be calculated more accurately.

이상, 눈의 간섭물을 제거하여 제1 이미지를 획득하는 과정에 대해 설명했다. 이하, 얼굴 영역을 기반으로 특징 맵 및 검출된 키 포인트 위치를 결정한 후 신뢰도를 결정하는 과정에 대해 설명한다.In the above, the process of obtaining the first image by removing the eye interference has been described. Hereinafter, a process of determining reliability after determining a feature map and a detected key point location based on a face region will be described.

일 실시예에 따르면, 단계(110)에서 이미지의 특징 맵이 획득될 때, 아래 단계가 수행될 수 있다.According to an embodiment, when the feature map of the image is obtained in step 110 , the following steps may be performed.

1) 검출될 비디오의 이전 프레임 이미지에서 기준 얼굴 영역이 획득된 경우, 기준 얼굴 영역을 기반으로 이미지의 얼굴 영역을 결정하는 단계1) when a reference face area is obtained from a previous frame image of a video to be detected, determining a face area of the image based on the reference face area

2) 검출될 비디오의 이전 프레임 이미지의 기준 얼굴 영역이 획득되지 않은 경우, 이미지의 얼굴 영역을 검출하는 단계2) detecting the face region of the image when the reference face region of the image of the previous frame of the video to be detected is not obtained;

3) 얼굴 영역으로부터 특징 맵을 추출하는 단계3) extracting the feature map from the face region

이때, 추적 실패 검출 모델이 이용될 수 있다. 검출될 비디오의 이전 프레임 이미지에 대해, 이전 프레임 이미지의 보정된 특징 맵의 신뢰도가 결정될 수 있다. 신뢰도가 미리 설정된 임계 값보다 크면, 이전 프레임의 키 포인트 검출에 성공한 것이고, 이전 프레임 이미지의 얼굴 영역이 이미지의 얼굴 영역으로 간주될 수 있다.In this case, a tracking failure detection model may be used. For a previous frame image of the video to be detected, a confidence level of the corrected feature map of the previous frame image may be determined. If the reliability is greater than the preset threshold, the key point detection of the previous frame is successful, and the face region of the image of the previous frame may be regarded as the face region of the image.

다시 말해, 이전 프레임 이미지의 키 포인트 검출을 성공하면, 이전 프레임 이미지의 얼굴 영역을 기반으로 이미지의 얼굴 영역이 결정될 수 있고, 이전 프레임 이미지의 키 포인트 검출에 실패하면, 얼굴 영역 검출이 현재 프레임에 관해 다시 수행될 수 있다.In other words, if the key point detection of the previous frame image succeeds, the face area of the image may be determined based on the face area of the previous frame image. can be performed again.

일 실시예에 따르면, 아래의 단계가 더 수행될 수 있다.According to an embodiment, the following steps may be further performed.

1) 보정된 특징 맵의 신뢰도를 결정하는 단계1) determining the reliability of the corrected feature map

2) 신뢰도에 기초하여 타겟 추적의 성공 여부를 결정하는 단계2) determining whether the target tracking is successful based on the reliability

신뢰도가 미리 설정된 임계 값보다 크면, 타겟 추적에 성공한 것이고, 이미지의 얼굴 영역이 검출될 비디오의 다음 프레임 이미지의 기준 얼굴 영역으로 설정될 수 있다.If the reliability is greater than the preset threshold, the target tracking is successful, and the face region of the image may be set as the reference face region of the image of the next frame of the video to be detected.

특징 맵의 신뢰도를 결정하는 단계는, 특징 맵에 대해 컨볼루션 연산, 완전 연결(fully connected) 연산, 및 소프트맥스(soft-max) 연산을 수행하여, 특징 맵의 신뢰도를 획득하는 단계를 포함할 수 있다.Determining the reliability of the feature map may include performing a convolution operation, a fully connected operation, and a soft-max operation on the feature map to obtain reliability of the feature map. can

획득한 이미지의 특징 맵에 대해 컨볼루션을 수행하고, 그 결과에 대해 전체 연결 및 소프트맥스 연산을 수행하여, 2차원 벡터가 생성될 수 있다. 2차원 벡터의 한 요소는 추척 실패 확률일 수 있고, 다른 한 요소는 추적 성공 확률일 수 있다. 2차원 벡터를 통해 특징 맵의 신뢰도가 설정될 수 있고, 이 신뢰도가 미리 설정된 임계 값보다 작거나 같으면 추적 실패, 즉 키 포인트 검출에 실패한 것으로 간주되고, 신뢰도가 미리 설정된 임계 값보다 크면, 추적 성공, 즉 키 포인트 검출에 성공한 것으로 간주될 수 있다.A two-dimensional vector may be generated by performing convolution on the feature map of the acquired image, and performing global concatenation and softmax operation on the result. One element of the two-dimensional vector may be a tracking failure probability, and the other element may be a tracking success probability. The reliability of the feature map can be set through a two-dimensional vector, and if this reliability is less than or equal to a preset threshold value, it is considered a tracking failure, that is, a key point detection failure. , that is, it can be considered as a successful key point detection.

이전 프레임 이미지를 통한 추적에 성공하면, 이전 프레임에서 검출된 얼굴 이미지의 키 포인트의 경계 직사각형 박스가 계산될 수 있고, 너비와 높이 중에 최대 값을 얻을 수 있다. 그런 다음, 중심점을 고정하고, 정방형을 얻을 수 있다. 이 정방형의 변의 길이는 앞선 최대 값의 s배일 수 있다. s는 정수일 수 있다. 이 정방형은 새 프레임의 얼굴 박스의 값, 즉 이미지의 얼굴 박스의 값으로 사용될 수 있다. 이전 프레임을 통한 추적에 실패하면, 현재 프레임에서 얼굴 검출 모듈을 다시 실행하여 얼굴 박스가 검출될 수 있다.If the tracking through the previous frame image is successful, the bounding rectangular box of the key point of the face image detected in the previous frame can be calculated, and the maximum value among the width and the height can be obtained. Then you can fix the center point and get a square. The length of the side of this square may be s times the previous maximum value. s may be an integer. This square can be used as the value of the face box of the new frame, that is, the value of the face box of the image. If tracking through the previous frame fails, the face box may be detected by executing the face detection module again in the current frame.

신뢰도를 결정하는 과정은 추적 실패 검출 모델을 사용하여 수행할 수 있다. 추적 실패 검출 모델은 컨볼루션 레이어, 전체 연결 레이어, 및 소프트맥스 레이어를 포함할 수 있다. 이미지의 특징 맵을 추척 실패 검출 네트워크에 입력하여 키 포인트 검출 성공 여부가 판단될 수 있다.The process of determining the reliability may be performed using a tracking failure detection model. The tracking failure detection model may include a convolutional layer, a full concatenated layer, and a softmax layer. By inputting the feature map of the image to the tracking failure detection network, it may be determined whether key point detection succeeds.

이하, 추적 실패 검출 모델의 트레이닝 과정에 대해 설명한다.Hereinafter, a training process of the tracking failure detection model will be described.

도 10에 도시된 것처럼, 트레이닝된 동공 포지셔닝 모델(1010)을 추적 실패 검출 모델(1020)과 연결하여 추적 실패 검출 모델(1020)이 트레이닝될 수 있다. 본 트레이닝 과정에서, 추적 실패 검출 모델(1020)의 파라미터만 조절될 수 있다. 동공 포지셔닝 모델(1010)은 미리 트레이닝될 수 있고, 본 트레이닝 과정에서 동공 포지셔닝 모델(1010)의 파라미터는 고정될 수 있다.As shown in FIG. 10 , the tracking failure detection model 1020 may be trained by connecting the trained pupil positioning model 1010 with the tracking failure detection model 1020 . In this training process, only parameters of the tracking failure detection model 1020 may be adjusted. The pupil positioning model 1010 may be trained in advance, and parameters of the pupil positioning model 1010 may be fixed during this training process.

동공 포지셔닝 모델(1010)은 얼굴 이미지(1001)로부터 키 포인트 위치(1003)를 예측할 수 있다. 동공 포지셔닝 모델(1010)의 예측 과정에서 컨볼루션 연산을 통해 특징 맵(1002)(예: F 또는 F')이 생성되면, 특징 맵(1002)은 추적 실패 검출 모델(1020)에 제공될 수 있다. 추적 실패 검출 모델(1020)은 일련의 컨볼루션 레이어, 완전 연결 레이어, 및 소프트맥스 레이어를 포함할 수 있고, 컨볼루션 연산, 완전 연결 연산, 및 소프트맥스 연산을 통해 추적 평가 값(1006)을 생성할 수 있다. 추적 평가 값(1006)은 키 포인트 위치(1003)의 신뢰도를 나타낼 수 있다. 예를 들어, 추적 평가 값(1006)은 0 또는 1의 값을 가질 수 있다. 0은 추적 실패를 나타낼 수 있고, 1은 추척 성공을 나타낼 수 있다. 추적 실패는 동공 포지셔닝 모델(1010)에 의해 예측된 키 포인트 위치(1003)와 실제 키 포인트 위치(ground truth, GT) 간의 거리에 기초하여 정의될 수 있다.The pupil positioning model 1010 may predict a key point position 1003 from the face image 1001 . When a feature map 1002 (eg, F or F ') is generated through a convolution operation in the prediction process of the pupil positioning model 1010, the feature map 1002 may be provided to the tracking failure detection model 1020. . The tracking failure detection model 1020 may include a series of convolutional layers, a fully connected layer, and a softmax layer, and generates a tracking evaluation value 1006 through a convolution operation, a fully connected operation, and a softmax operation. can do. The tracking evaluation value 1006 may indicate the reliability of the key point location 1003 . For example, the tracking evaluation value 1006 may have a value of 0 or 1. 0 may indicate tracking failure, and 1 may indicate tracking success. The tracking failure may be defined based on the distance between the key point position 1003 predicted by the pupil positioning model 1010 and the actual key point position (ground truth, GT).

실제 키 포인트 위치(GT) 및 예측된 키 포인트 위치(1003)에 따라 신뢰도가 정의될 수 있다. 예측된 키 포인트 위치는 a', ground truth는

, 신뢰도는

, p는 (0,1)의 임계 값이라면, 추적 실패 검출 모델(1020)의 추적 평가 값(1006)은 아래의 수학식 4와 같이 정의될 수 있다.Reliability may be defined according to the actual key point position GT and the predicted key point position 1003 . The predicted key point position is a', the ground truth is

, the reliability is

, p is a threshold value of (0,1), the tracking evaluation value 1006 of the tracking failure detection model 1020 may be defined as in Equation 4 below.

는 추적 평가 값(1006)을 나타낼 수 있고, 2차원 벡터일 수 있다. 어느 하나의 값은 추적 실패 확률을 나타낼 수 있고, 다른 하나의 값은 성공 확률을 나타낼 수 있다. 추적 판단의 손실 함수(1030)는 교차 엔트로피(cross entropy)를 사용하여 정의될 수 있다. 임의의 프레임의 얼굴 이미지(1001)를 동공 포지셔닝 모델(1010)에 입력하여 키 포인트 위치(1003)가 획득될 수 있다. 예측 오차(1004)는 키 포인트 위치(1003)와 실제 키 포인트 위치(GT)를 기반으로 결정될 수 있고, 추적 실패 검출 모델(1020)을 기반으로 추적 평가 값(1006)이 결정될 수 있다. 그런 다음, 추적 평가 값(1006)과 예측 오차(1004)에 따라 손실 함수(1030)를 결정하여 추적 실패 검출 모델(1020)의 파라미터가 조절될 수 있다.

may represent the tracking evaluation value 1006, and may be a two-dimensional vector. Either value may indicate a tracking failure probability, and the other value may indicate a success probability. The loss function 1030 of the tracking decision may be defined using cross entropy. A key point position 1003 may be obtained by inputting the face image 1001 of an arbitrary frame into the pupil positioning model 1010 . The prediction error 1004 may be determined based on the key point position 1003 and the actual key point position GT, and the tracking evaluation value 1006 may be determined based on the tracking failure detection model 1020 . Then, the parameters of the tracking failure detection model 1020 may be adjusted by determining the loss function 1030 according to the tracking evaluation value 1006 and the prediction error 1004 .

상술된 실시예에서, 이미지의 보정된 특징 맵(F')의 신뢰도가 결정될 수 있다. 신뢰도가 미리 설정된 임계 값보다 작거나 같으면 추적 실패, 즉 키 포인트 검출 실패로 간주될 수 있고, 신뢰도가 미리 설정된 임계 값보다 크면 추적 성공, 즉 키 포인트 검출 성공으로 간주될 수 있다. 이를 통해, 키 포인트 검출의 정확도가 높아질 수 있다. 또한, 키 포인트 검출 성공 시 이미지의 얼굴 영역을 검출될 다음 프레임 이미지의 기준 얼굴 영역으로 사용하여 다음 프레임 이미지의 처리 시 소요되는 시간 및 계산 효율이 향상될 수 있다.In the above-described embodiment, the reliability of the corrected feature map F' of the image may be determined. If the reliability is less than or equal to a preset threshold value, it may be regarded as a tracking failure, that is, a key point detection failure, and if the reliability is greater than a preset threshold value, it may be considered as a tracking success, that is, a key point detection success. Through this, the accuracy of key point detection may be increased. In addition, when the key point is successfully detected, the time required for processing the next frame image and calculation efficiency may be improved by using the face region of the image as the reference face region of the next frame image to be detected.

이하, 상술된 이미지 처리 방법의 적용 시나리오에 대해 설명한다.Hereinafter, an application scenario of the above-described image processing method will be described.

일 실시예에 따르면, 동공의 키 포인트 위치가 획득되면, 이를 기반으로, 디스플레이 인터페이스의 3차원 디스플레이 효과를 조절하는 단계가 더 수행될 수 있다.According to an embodiment, when the key point position of the pupil is obtained, the step of adjusting the 3D display effect of the display interface may be further performed based on this.

예를 들어, 이미지 처리 방법은 3차원 디스플레이를 갖는 포터블 디바이스에 적용될 수 있다. 이 경우, 동공에 대응하는 키 포인트 위치, 즉 동공의 3차원 위치가 결정되고, 동공의 3차원 위치에 따라 포터블 디바이스의 인터페이스의 3차원 디스플레이 효과(예: 인터페이스 상의 3차원 아이콘, 3차원 비디오의 디스플레이 효과)가 결정되고, 사용자의 동공 위치의 변화에 따라 인터페이스의 3차원 디스플레이 효과가 조절될 수 있다.For example, the image processing method may be applied to a portable device having a three-dimensional display. In this case, the position of the key point corresponding to the pupil, that is, the 3D position of the pupil is determined, and according to the 3D position of the pupil, the 3D display effect of the interface of the portable device (eg, 3D icon on the interface, 3D video display effect) is determined, and the 3D display effect of the interface may be adjusted according to a change in the user's pupil position.

이하, 상술한 이미지 처리 방법과 관련하여 이미지 처리의 예시를 설명한다.Hereinafter, examples of image processing in relation to the above-described image processing method will be described.

일 실시예에 따르면, 이미지는 얼굴 이미지일 수 있고, 키 포인트는 동공 키 포인트를 포함할 수 있고, 눈 간섭물은 안경일 수 있고, 이미지 처리 방법은 도 11과 같이 아래의 단계를 포함할 수 있다.According to an embodiment, the image may be a face image, the key point may include a pupil key point, the eye interferer may be glasses, and the image processing method may include the following steps as shown in FIG. 11 . .

1) 제2 이미지(1101), 즉 눈 간섭물을 포함하는 안경의 이미지를 획득하는 단계1) acquiring a second image 1101, that is, an image of glasses including an eye interferer

2) 제2 이미지(1101)를 간섭물 제거 모델(1110)에 입력하여 간섭물이 제거된 제1 이미지(1102)를 획득하는 단계(제1 이미지(1102)의 해상도는 H*W임)2) inputting the second image 1101 into the interference removal model 1110 to obtain the first image 1102 from which the interference has been removed (the resolution of the first image 1102 is H*W)

3) 제1 이미지(1102)를 다운 샘플링 네트워크(1120)에 입력하여 이미지를 획득하는 단계(이미지의 해상도는 h*w임)3) inputting the first image 1102 into the down-sampling network 1120 to obtain an image (the resolution of the image is h*w)

4) 이미지를 동공 포지셔닝 모델에 입력하여 키 포인트 위치를 추정하고, 이미지의 특징 맵을 추적 실패 검출 모델에 입력하여 특징 맵의 신뢰도를 결정하고, 신뢰도에 따라 추적 성공 여부를 결정하는 단계4) inputting the image into the pupil positioning model to estimate key point positions, inputting the image feature map into the tracking failure detection model to determine the reliability of the feature map, and determining whether or not tracking succeeds according to the reliability

5) 키 포인트 검출에 성공하면, 키 포인트를 기반으로 다음 프레임의 초기 얼굴 박스를 추정하고, 다음 프레임의 원본 이미지를 획득하고, 다음 프레임 이미지의 키 포인트 검출 프로세스를 반복하는 단계5) if the key point detection is successful, estimating the initial face box of the next frame based on the key point, obtaining the original image of the next frame, and repeating the key point detection process of the next frame image

6) 키 포인트 검출에 실패하면, 다음 프레임 이미지 처리 시, 이미지의 키 포인트를 사용하여 초기 얼굴 박스를 추정하는 것이 아닌, 얼굴 검출을 다시 수행하여 초기 얼굴 박스를 추정하고, 다음 프레임의 원본 이미지를 획득하고, 다음 프레임 이미지의 키 포인트 검출 프로세스를 반복하는 단계6) If the key point detection fails, when processing the next frame image, instead of estimating the initial face box using the key point of the image, perform face detection again to estimate the initial face box, and the original image of the next frame acquiring and repeating the key point detection process of the next frame image

상술된 이미지 처리 방법에 따르면, 이미지 특징 맵에서 픽셀 포인트의 공간 위치 가중치를 결정하고, 픽셀 포인트의 공간 위치 가중치를 기반으로 특징 맵을 보정하여 보정된 특징 맵을 획득하고, 보정된 특징 맵에서 키 포인트 위치를 검출함으로써, 키 포인트 위치 검출의 정확도가 향상될 수 있다.According to the above-described image processing method, the spatial position weight of pixel points in the image feature map is determined, the feature map is corrected based on the spatial position weight of pixel points to obtain a corrected feature map, and the key in the corrected feature map is determined. By detecting the point position, the accuracy of key point position detection can be improved.

또한, 제1 이미지(1102)를 처리할 때, 제1 이미지(1102)의 이미지 특징 정보 및 투영 포인트의 제1 상대 거리 정보가 입력으로 조합될 수 있다. 이 경우, 투영 포인트의 위치의 영향이 입력에 더 잘 반영될 수 있고, 이를 통해 획득된 이미지는 목표 해상도를 가질 수 있고, 제1 이미지(1102)의 이미지 특징 또한 유지될 수 있다.Further, when processing the first image 1102 , image characteristic information of the first image 1102 and first relative distance information of the projection point may be combined as inputs. In this case, the influence of the position of the projection point may be better reflected in the input, the image obtained through this may have a target resolution, and the image characteristics of the first image 1102 may also be maintained.

또한, 에지 매칭 방법의 채택에 따라 눈과 동공의 위치가 변하지 않고 잘 유지될 수 있고, 동공의 키 포인트 위치가 보다 정확하게 계산될 수 있다.In addition, according to the adoption of the edge matching method, the position of the eye and the pupil can be well maintained without changing, and the position of the key point of the pupil can be calculated more accurately.

또한, 이미지의 보정된 특징 맵의 신뢰도를 결정할 때, 신뢰도가 미리 설정된 임계 값보다 작거나 같으면 추적 실패, 즉 키 포인트 검출 실패로 간주될 수 있고, 신뢰도가 미리 설정된 임계 값보다 크면 추적 성공, 즉 키 포인트 검출 성공으로 간주될 수 있다. 이를 통해, 키 포인트 검출의 정확도가 높아질 수 있다. 또한, 키 포인트 검출 성공 시 이미지의 얼굴 영역을 검출될 다음 프레임 이미지의 기준 얼굴 영역으로 사용하여 다음 프레임 이미지의 처리 시 소요되는 시간 및 계산 효율이 향상될 수 있다.In addition, when determining the reliability of the corrected feature map of the image, if the reliability is less than or equal to a preset threshold value, it may be regarded as a tracking failure, that is, a key point detection failure, and if the reliability is greater than a preset threshold value, tracking success, that is, It can be considered as a key point detection success. Through this, the accuracy of key point detection may be increased. In addition, when the key point is successfully detected, the time required for processing the next frame image and calculation efficiency may be improved by using the face region of the image as the reference face region of the next frame image to be detected.

일 실시예에 따르면, 도 12에 도시된 것처럼 이미지 처리 방법은 아래의 단계를 포함할 수 있다.According to an embodiment, as shown in FIG. 12 , the image processing method may include the following steps.

단계(1210)에서 이미지의 특징 맵의 후보 박스가 획득될 수 있다.In operation 1210 , a candidate box of the feature map of the image may be obtained.

이미지를 타겟 검출 네트워크(예: RPN(region proposal network))에 입력하여, 후보 박스를 얻을 수 있다.By inputting the image into a target detection network (eg a region proposal network (RPN)), candidate boxes can be obtained.

단계(1220)에서 미리 설정된 크기의 제1 특징 맵의 좌표 포인트의 후보 박스 내의 특징 맵 상의 투영 포인트와 투영 포인트의 인접 좌표 포인트 간의 상대 거리 정보에 해당하는 제2 상대 거리 정보가 결정될 수 있다.In operation 1220 , second relative distance information corresponding to relative distance information between the projection point on the feature map in the candidate box of the coordinate point of the first feature map having a preset size and the adjacent coordinate point of the projection point may be determined.

단계(1230)에서 제2 상대 거리 정보 및 후보 박스 내의 특징 맵을 기반으로 제2 보간 계수가 결정될 수 있다.In operation 1230 , a second interpolation coefficient may be determined based on the second relative distance information and the feature map in the candidate box.

일 실시예에 따르면, 단계(1230)는, 아래 단계를 포함할 수 있다.According to an embodiment, step 1230 may include the following steps.

a. 후보 박스 내의 특징 맵과 제2 상대 거리 정보에 따른 제2 스플라이싱 특징을 획득하는 단계a. obtaining a second splicing feature according to the feature map in the candidate box and the second relative distance information;

b. 제2 스플라이싱 특징에 기초한 컨볼루션을 수행하여 제2 보간 계수를 획득하는 단계b. performing convolution based on the second splicing feature to obtain second interpolation coefficients;

a. 후보 박스 내의 특징 맵에 기초한 컨볼루션을 수행하여 컨볼루션 특징 맵을 획득하는 단계a. performing convolution based on the feature map in the candidate box to obtain a convolutional feature map

b. 컨볼루션 특징 맵 및 제2 상대 거리 정보에 기반하여 제2 보간 계수를 결정하는 단계b. determining a second interpolation coefficient based on the convolutional feature map and the second relative distance information;

후보 박스의 특징 맵에 기초한 컨볼루션이 수행되면, 이를 통해 획득된 컨볼루션 특징 맵과 제2 상대 거리 정보를 스플라이싱하여 제2 보간 계수가 획득될 수 있다.When the convolution based on the feature map of the candidate box is performed, the second interpolation coefficient may be obtained by splicing the obtained convolution feature map and the second relative distance information.

단계(1240)에서 제2 보간 계수 및 후보 박스 내의 특징 맵을 기반으로 보간을 수행하여 제1 특징 맵이 획득될 수 있다. 단계(1240)를 통해 후보 박스 내의 특징 맵의 크기가 미리 설정된 크기의 제1 특징 맵으로 스케일링될 수 있다.In operation 1240 , the first feature map may be obtained by performing interpolation based on the second interpolation coefficient and the feature map in the candidate box. In operation 1240 , the size of the feature map in the candidate box may be scaled to the first feature map having a preset size.

예를 들어, 쌍선형 보간과 유사하게, 제1 후보 박스 내 제1 특징 맵의 임의의 한 특징을 후보 박스 내 특징 맵 상에 투영하여 대응되는 투영 포인트가 획득되고, 후보 박스를 통해 특징 맵의 투영 포인트의 인접 내의 여러 특징에 기초한 보간을 수행하여 제1 후보 박스 내 제1 특징 맵 상의 특징이 획득되고, 이를 통해 제1 후보 박스 내 제1 특징 맵을 생성하여 제1 후보 박스가 획득될 수 있다.For example, similar to bilinear interpolation, by projecting any one feature of the first feature map in the first candidate box onto the feature map in the candidate box, the corresponding projection point is obtained, and through the candidate box the A feature on the first feature map in the first candidate box is obtained by performing interpolation based on several features in the neighborhood of the projection point, and through this, the first candidate box can be obtained by generating a first feature map in the first candidate box have.

단계(1250)에서 제1 특징 맵을 기반으로 처리가 수행될 수 있다.In step 1250 , processing may be performed based on the first feature map.

선택적으로, 처리는 타겟 검출, 타겟 분류, 타겟 인스턴스 분할과 같은 다른 작업일 수 있다. 다만, 처리가 이에 한정되는 것은 아니다. 또한, 이미지 특징 맵의 후보 박스가 획득되면, 후보 박스는 이미지에서 타겟 카테고리 및 타겟 위치를 결정하는데 사용될 수 있다.Optionally, the processing may be other operations such as target detection, target classification, target instance segmentation. However, the processing is not limited thereto. Also, once a candidate box of the image feature map is obtained, the candidate box may be used to determine a target category and target location in the image.

상술한 실시예에서, 다운 샘플링 네트워크를 통해 후보 박스 내 특징 맵의 크기를 미리 설정된 크기로 스케일링하여 미리 설정된 크기의 제1 특징 맵이 획득될 수 있다.In the above-described embodiment, the first feature map of the preset size may be obtained by scaling the size of the feature map in the candidate box to the preset size through the downsampling network.

상술된 이미지 처리 방법에서, 타겟 검출에 다운 샘플링 네트워크를 적용함으로써, 다운 샘플링 네트워크를 기반으로 제2 보간 계수가 계산되고, 검출될 이미지의 후보 박스 내 특징 맵을 스케일링하여 제1 후보 박스 내 제1 특징 맵이 획득되고, 스케일링된 제1 특징 맵을 기반으로 타겟 검출을 수행하여, 타겟 탐지의 정확도가 향상될 수 있다.In the image processing method described above, by applying a down-sampling network to target detection, a second interpolation coefficient is calculated based on the down-sampling network, and scaling the feature map in the candidate box of the image to be detected to the first in the first candidate box A feature map is obtained, and target detection is performed based on the scaled first feature map, so that accuracy of target detection may be improved.

이상, 이미지 처리 방법을 설명했다. 이하, 이미지 처리 방법을 장치의 관점에서 설명한다.The image processing method has been described above. Hereinafter, the image processing method will be described from the viewpoint of the apparatus.

도 13은 일 실시예에 따른 이미지 처리 장치의 구성을 나타내는 블록도이다. 도 13을 참조하면, 이미지 처리 장치(1300)는 프로세서(1310) 및 메모리(1320)를 포함한다. 메모리(1320)는 프로세서(1310)에 연결되고, 프로세서(1310)에 의해 실행가능한 명령어들, 프로세서(1310)가 연산할 데이터 또는 프로세서(1310)에 의해 처리된 데이터를 저장할 수 있다. 메모리(1320)는 비일시적인 컴퓨터 판독가능 매체, 예컨대 고속 랜덤 액세스 메모리 및/또는 비휘발성 컴퓨터 판독가능 저장 매체(예컨대, 하나 이상의 디스크 저장 장치, 플래쉬 메모리 장치, 또는 기타 비휘발성 솔리드 스테이트 메모리 장치)를 포함할 수 있다. 예를 들어, 메모리(1320)는 동공 포지셔닝 모델, 다운 샘플링 네트워크, 간섭물 제거 모델, 검증 모델, 및 추적 실패 검출 모델 중 적어도 하나를 저장할 수 있다.13 is a block diagram illustrating a configuration of an image processing apparatus according to an exemplary embodiment. Referring to FIG. 13 , the image processing apparatus 1300 includes a processor 1310 and a memory 1320 . The memory 1320 is connected to the processor 1310 and may store instructions executable by the processor 1310 , data to be operated by the processor 1310 , or data processed by the processor 1310 . Memory 1320 may include non-transitory computer-readable media, such as high-speed random access memory and/or non-volatile computer-readable storage media (eg, one or more disk storage devices, flash memory devices, or other non-volatile solid state memory devices). may include For example, the memory 1320 may store at least one of a pupil positioning model, a down sampling network, an interference rejection model, a verification model, and a tracking failure detection model.

프로세서(1310)는 도 1 내지 도 12 및 도 14를 참조하여 설명된 동작을 수행하기 위한 명령어들을 실행할 수 있다. 예를 들어, 프로세서(1310)는 이미지의 특징 맵을 획득하고, 특징 맵의 픽셀 포인트의 공간 위치 가중치를 결정하고, 픽셀 포인트의 공간 위치 가중치를 기반으로 특징 맵을 보정하여 보정된 특징 맵을 획득하고, 보정된 특징 맵에 따라 키 포인트를 결정할 수 있다.The processor 1310 may execute instructions for performing the operations described with reference to FIGS. 1 to 12 and 14 . For example, the processor 1310 obtains a feature map of the image, determines a spatial position weight of a pixel point of the feature map, and corrects the feature map based on the spatial position weight of the pixel point to obtain a corrected feature map and a key point may be determined according to the corrected feature map.

프로세서(1310)는 이미지의 키 포인트의 초기 위치를 검출하고, 초기 위치에 따라 제1 가중치를 획득하고, 특징 맵에 따라 제2 가중치를 획득하고, 제1 가중치 및 제2 가중치를 기반으로 공간 위치 가중치를 결정할 수 있다.The processor 1310 detects an initial position of a key point of an image, obtains a first weight according to the initial position, obtains a second weight according to the feature map, and a spatial position based on the first weight and the second weight weight can be determined.

프로세서(1310)는 이미지의 특징 맵을 획득하기 전, 제1 상대 거리 정보 및 제1 이미지를 기반으로 제1 보간 계수를 결정하고, 제1 보간 계수 및 제1 이미지의 픽셀 포인트를 기반으로 보간을 수행하여 이미지를 획득할 수 있다. 제1 상대 거리 정보는 이미지의 픽셀 포인트의 제1 이미지 상의 투영 포인트와 투영 포인트의 인접 픽셀 포인트 간의 상대 거리 정보일 수 있다.The processor 1310 determines a first interpolation coefficient based on the first relative distance information and the first image before acquiring the feature map of the image, and performs interpolation based on the first interpolation coefficient and the pixel point of the first image. image can be obtained. The first relative distance information may be relative distance information between a projection point on the first image of a pixel point of the image and an adjacent pixel point of the projection point.

프로세서(1310)는 제1 이미지의 특징을 추출하고, 특징과 제1 상대 거리 정보를 스플라이싱하여 제1 스플라이싱 특징을 획득하고, 제1 스플라이싱 특징에 기초한 컨볼루션을 수행하여 제1 보간 계수를 획득할 수 있다. 프로세서(1310)는 제1 상대 거리 정보 및 제1 이미지를 기반으로 제1 보간 계수를 결정하기 전, 제2 이미지로부터 간섭물을 포함하는 눈 영역 이미지 블록을 크롭하여 눈 영역을 포함하지 않는 이미지를 획득하고, 제2 이미지에 따른 동공 가중치 맵을 결정하고, 동공 가중치 맵 및 눈 영역 이미지 블록에 기초하여 간섭물이 제거된 눈 영역 이미지 블록을 획득하고, 간섭물이 제거된 눈 영역 이미지 블록과 눈 영역을 포함하지 않은 이미지를 스플라이싱하여 제1 이미지 또는 이미지를 획득할 수 있다.The processor 1310 extracts a feature of the first image, splices the feature and the first relative distance information to obtain a first splicing feature, and performs convolution based on the first splicing feature to obtain the second 1 interpolation coefficient can be obtained. The processor 1310 crops the eye region image block including the interferer from the second image before determining the first interpolation coefficient based on the first relative distance information and the first image to obtain an image that does not include the eye region. obtain, determine a pupil weight map according to the second image, obtain an interference-removed eye region image block based on the pupil weight map and the eye region image block, and obtain an interference-removed eye region image block and eye The first image or image may be obtained by splicing the image that does not include the region.

프로세서(1310)는 특징 맵의 신뢰도를 결정하고, 신뢰도에 따라 추적 성공 여부를 결정할 수 있다. 프로세서(1310)는 특징 맵에 기초한 컨볼루션 연산, 완전 연결 연산 및 소프트맥스 연산을 수행하여 특징 맵의 신뢰도를 획득할 수 있다.The processor 1310 may determine the reliability of the feature map, and determine whether tracking succeeds according to the reliability. The processor 1310 may perform a convolution operation, a fully concatenated operation, and a softmax operation based on the feature map to obtain reliability of the feature map.

프로세서(1310)는 키 포인트의 위치에 기초하여 디스플레이 인터페이스의 3D 디스플레이 효과를 조절할 수 있다.The processor 1310 may adjust the 3D display effect of the display interface based on the position of the key point.

이미지 처리 장치(1300)는 이미지 특징 맵의 픽셀 포인트의 공간 위치 가중치 결정을 통해, 픽셀 포인트의 공간 위치 가중치를 기반으로 특징 맵을 보정하고, 보정된 특징 맵을 획득한다. 보정된 특징 맵을 검출하여 키 포인트 위치를 획득하고, 이를 통해 키 포인트 위치 검출의 정확도를 향상시킬 수 있다.The image processing apparatus 1300 corrects the feature map based on the spatial position weight of the pixel point by determining the spatial position weight of the pixel point of the image feature map, and obtains the corrected feature map. A key point position may be obtained by detecting the corrected feature map, and thus the accuracy of key point position detection may be improved.

또한, 이미지 처리 장치(1300)는 제1 이미지를 처리할 때, 제1 이미지의 이미지 특징 정보 및 투영 포인트의 제1 상대 거리 정보를 입력으로 조합하여, 투영 포인트의 위치의 영향을 입력에 더 잘 반영할 수 있다. 이를 통해 획득된 이미지가 목표 해상도를 갖는 동시에 제1 이미지의 이미지 특징 또한 유지될 수 있다.In addition, when processing the first image, the image processing apparatus 1300 combines the image characteristic information of the first image and the first relative distance information of the projection point as an input, so that the influence of the position of the projection point is better applied to the input. can reflect An image obtained through this may have a target resolution, and at the same time, image characteristics of the first image may be maintained.

또한, 이미지의 보정된 특징 맵의 신뢰도를 결정할 때, 이 신뢰도가 미리 설정된 임계 값보다 작거나 같으면 추적이 실패한 것으로 간주, 즉 키 포인트 검출이 실패한 것으로 간주될 수 있다. 신뢰도가 미리 설정된 임계 값보다 크면, 추적 성공, 즉 키 포인트 검출이 성공한 것으로 간주될 수 있다. 이에 따라 키 포인트 검출의 정확도가 향상될 수 있다. 또한, 키 포인트 검출 성공 시 이미지의 얼굴 영역을 검출될 비디오의 다음 프레임 이미지의 기준 얼굴 영역으로 사용함으로써, 다음 프레임 이미지 처리 효율이 향상될 수 있다.Further, when determining the reliability of the corrected feature map of the image, if the reliability is less than or equal to a preset threshold value, the tracking may be regarded as failed, that is, the key point detection may be regarded as failed. If the reliability is greater than a preset threshold, tracking success, that is, key point detection may be considered successful. Accordingly, the accuracy of key point detection may be improved. In addition, by using the face region of the image as the reference face region of the image of the next frame of the video to be detected upon successful key point detection, the processing efficiency of the next frame image may be improved.

도 14는 일 실시예에 따른 전자 장치의 구성을 나타내는 블록도이다. 도 14를 참조하면, 전자 장치(1400)는 프로세서(1410), 메모리(1420), 카메라(1430), 저장 장치(1440), 입력 장치(1450), 출력 장치(1460) 및 네트워크 인터페이스(1470)를 포함할 수 있으며, 이들은 통신 버스(1480)를 통해 서로 통신할 수 있다. 예를 들어, 전자 장치(1400)는 이동 전화, 스마트 폰, PDA, 넷북, 태블릿 컴퓨터, 랩톱 컴퓨터 등과 같은 모바일 장치, 스마트 워치, 스마트 밴드, 스마트 안경 등과 같은 웨어러블 디바이스, 데스크탑, 서버 등과 같은 컴퓨팅 장치, 텔레비전, 스마트 텔레비전, 냉장고 등과 같은 가전 제품, 도어 락 등과 같은 보안 장치, 자율주행 차량, 스마트 차량 등과 같은 차량의 적어도 일부로 구현될 수 있다. 전자 장치(1400)는 도 1의 얼굴 검출 장치(100)를 구조적 및/또는 기능적으로 포함할 수 있다.14 is a block diagram illustrating a configuration of an electronic device according to an exemplary embodiment. Referring to FIG. 14 , the electronic device 1400 includes a processor 1410 , a memory 1420 , a camera 1430 , a storage device 1440 , an input device 1450 , an output device 1460 , and a network interface 1470 . may include, and they may communicate with each other via a communication bus 1480 . For example, the electronic device 1400 may include a mobile device such as a mobile phone, a smart phone, a PDA, a netbook, a tablet computer, a laptop computer, and the like, a wearable device such as a smart watch, a smart band, smart glasses, and the like, and a computing device such as a desktop and a server. , televisions, smart televisions, home appliances such as refrigerators, security devices such as door locks, etc., autonomous vehicles, smart vehicles, etc. may be implemented as at least a part of the vehicle. The electronic device 1400 may structurally and/or functionally include the face detection device 100 of FIG. 1 .

프로세서(1410)는 전자 장치(1400) 내에서 실행하기 위한 기능 및 명령어들을 실행한다. 예를 들어, 프로세서(1410)는 메모리(1420) 또는 저장 장치(1440)에 저장된 명령어들을 처리할 수 있다. 프로세서(1410)는 도 1 내지 도 13을 통하여 설명된 하나 이상의 동작을 수행할 수 있다. 메모리(1420)는 컴퓨터 판독가능한 저장 매체 또는 컴퓨터 판독가능한 저장 장치를 포함할 수 있다. 메모리(1420)는 프로세서(1410)에 의해 실행하기 위한 명령어들을 저장할 수 있고, 전자 장치(1400)에 의해 소프트웨어 및/또는 애플리케이션이 실행되는 동안 관련 정보를 저장할 수 있다. 예를 들어, 메모리(1420)는 동공 포지셔닝 모델, 다운 샘플링 네트워크, 간섭물 제거 모델, 검증 모델, 및 추적 실패 검출 모델 중 적어도 하나를 저장할 수 있다.The processor 1410 executes functions and instructions for execution in the electronic device 1400 . For example, the processor 1410 may process instructions stored in the memory 1420 or the storage device 1440 . The processor 1410 may perform one or more operations described with reference to FIGS. 1 to 13 . Memory 1420 may include a computer-readable storage medium or computer-readable storage device. The memory 1420 may store instructions for execution by the processor 1410 , and may store related information while software and/or applications are executed by the electronic device 1400 . For example, the memory 1420 may store at least one of a pupil positioning model, a down-sampling network, an interference rejection model, a verification model, and a tracking failure detection model.

카메라(1430)는 사진 및/또는 비디오를 촬영할 수 있다. 저장 장치(1440)는 컴퓨터 판독가능한 저장 매체 또는 컴퓨터 판독가능한 저장 장치를 포함한다. 저장 장치(1440)는 메모리(1420)보다 더 많은 양의 정보를 저장하고, 정보를 장기간 저장할 수 있다. 예를 들어, 저장 장치(1440)는 자기 하드 디스크, 광 디스크, 플래쉬 메모리, 플로피 디스크 또는 이 기술 분야에서 알려진 다른 형태의 비휘발성 메모리를 포함할 수 있다.The camera 1430 may take pictures and/or video. Storage device 1440 includes a computer-readable storage medium or computer-readable storage device. The storage device 1440 may store a larger amount of information than the memory 1420 and may store the information for a long period of time. For example, the storage device 1440 may include a magnetic hard disk, an optical disk, a flash memory, a floppy disk, or other form of non-volatile memory known in the art.

입력 장치(1450)는 키보드 및 마우스를 통한 전통적인 입력 방식, 및 터치 입력, 음성 입력, 및 이미지 입력과 같은 새로운 입력 방식을 통해 사용자로부터 입력을 수신할 수 있다. 예를 들어, 입력 장치(1450)는 키보드, 마우스, 터치 스크린, 마이크로폰, 또는 사용자로부터 입력을 검출하고, 검출된 입력을 전자 장치(1400)에 전달할 수 있는 임의의 다른 장치를 포함할 수 있다. 출력 장치(1460)는 시각적, 청각적 또는 촉각적인 채널을 통해 사용자에게 전자 장치(1400)의 출력을 제공할 수 있다. 출력 장치(1460)는 예를 들어, 디스플레이, 터치 스크린, 스피커, 진동 발생 장치 또는 사용자에게 출력을 제공할 수 있는 임의의 다른 장치를 포함할 수 있다. 네트워크 인터페이스(1470)는 유선 또는 무선 네트워크를 통해 외부 장치와 통신할 수 있다.The input device 1450 may receive an input from a user through a traditional input method through a keyboard and a mouse, and a new input method such as a touch input, a voice input, and an image input. For example, the input device 1450 may include a keyboard, mouse, touch screen, microphone, or any other device capable of detecting input from a user and transmitting the detected input to the electronic device 1400 . The output device 1460 may provide an output of the electronic device 1400 to the user through a visual, auditory, or tactile channel. Output device 1460 may include, for example, a display, touch screen, speaker, vibration generating device, or any other device capable of providing output to a user. The network interface 1470 may communicate with an external device through a wired or wireless network.

본 출원의 실시예에서 제공되는 장치(예: 이미지 처리 장치(1300), 전자 장치(1400))는 AI(인공 지능, Artificial Intelligence) 모델을 통해 다중 모듈 중 적어도 하나를 구현할 수 있다. AI와 관련된 기능은 비 휘발성 메모리, 휘발성 메모리 및 프로세서에서 수행할 수 있다.A device (eg, the image processing device 1300 and the electronic device 1400 ) provided in the embodiment of the present application may implement at least one of multiple modules through an artificial intelligence (AI) model. Functions related to AI can be performed in non-volatile memory, volatile memory, and processors.

해당 프로세서는 하나 이상의 프로세서를 포함할 수 있다. 이때, 해당 하나 이상의 프로세서는 CPU (Central Processing Unit), AP (Application Processor) 등과 같은 범용 프로세서이거나, GPU (Graphics Processing Unit), VPU(video processor unit) 및 /또는 NPU(Neural Network Processing Unit)와 같은 AI 전용 프로세서와 같은 순수 그래픽 처리 유닛일 수 있다.The processor may include one or more processors. In this case, the at least one processor is a general-purpose processor such as a CPU (Central Processing Unit) or an AP (Application Processor), or a graphics processing unit (GPU), a video processor unit (VPU), and/or a Neural Network Processing Unit (NPU). It can be a pure graphics processing unit, such as an AI-only processor.

해당 하나 이상의 프로세서는 비 휘발성 메모리 및 휘발성 메모리에 저장된 사전 정의된 작동 규칙 또는 인공 지능(AI) 모델에 따라 입력 데이터의 처리를 제어한다. 트레이닝 또는 학습을 통해 사전 정의된 작동 규칙 또는 인공 지능 모델을 제공한다.The one or more processors control the processing of input data according to predefined operating rules or artificial intelligence (AI) models stored in non-volatile and volatile memory. It provides predefined operating rules or artificial intelligence models through training or learning.

여기서, 학습을 통한 제공은 학습 알고리즘을 다중 학습 데이터에 적용하여 사전 정의된 작동 규칙 또는 원하는 특성을 가진 AI 모델을 획득하는 것을 의미한다. 해당 학습은 실시예에 따른 AI가 수행되는 장치 자체에서 수행될 수 있고, 및/또는 별도의 서버/시스템을 통해 구현될 수 있다.Here, provision through learning means acquiring an AI model with predefined operating rules or desired characteristics by applying a learning algorithm to multiple learning data. The corresponding learning may be performed in the device itself in which the AI according to the embodiment is performed, and/or may be implemented through a separate server/system.

해당 AI 모델은 복수의 신경망 레이어를 포함할 수 있다. 각 레이어에는 여러 가중치 값이 있으며, 하나의 레이어 계산은 이전 레이어의 계산 결과와 현재 레이어의 복수의 가중치에 의해 수행된다. 신경망의 예시로, 컨볼루션 신경망(CNN), 심층 신경망(DNN), 순환 신경망(RNN), 제한된 볼츠만 머신(RBM), 심층 신념 네트워크(DBN), 양방향 순환 심층 네트워크(BRDNN), 생성적 대립쌍 네트워크(GAN) 및 심층 Q 네트워크를 포함하나, 이에 제한되지 않는다. The AI model may include a plurality of neural network layers. Each layer has multiple weight values, and calculation of one layer is performed based on the calculation result of the previous layer and multiple weights of the current layer. Examples of neural networks include convolutional neural networks (CNN), deep neural networks (DNNs), recurrent neural networks (RNNs), restricted Boltzmann machines (RBMs), deep belief networks (DBNs), bidirectional recurrent deep networks (BRDNNs), generative alleles networks (GANs) and deep Q networks.

학습 알고리즘은 다중 학습 데이터를 사용하여 미리 결정된 대상 장치(예, 로봇)를 트레이닝하여 대상 장치를 결정하거나 예측하도록 만들거나 허용하거나 제어하는 방법이다. 해당 학습 알고리즘의 예시로, 지도 학습(supervised learning), 비지도 학습, 반 지도 학습 또는 강화 학습을 포함하나, 이에 제한되지 않는다.A learning algorithm is a method of making, allowing or controlling a predetermined target device (eg, a robot) to determine or predict a target device by training a predetermined target device (eg, a robot) using multiple training data. Examples of the corresponding learning algorithm include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA) array), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using a general purpose computer or special purpose computer. The processing device may execute an operating system (OS) and a software application running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 저장할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer readable medium may store program instructions, data files, data structures, etc. alone or in combination, and the program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. have. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or a plurality of software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those of ordinary skill in the art may apply various technical modifications and variations based thereon. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

obtaining a feature map of the image;
determining spatial position weights of pixel points of the feature map;
obtaining a corrected feature map by correcting the feature map based on the spatial position weight of the pixel point; and
determining a key point according to the corrected feature map;
An image processing method comprising a.

According to claim 1,
The step of determining the spatial location weight is
detecting an initial position of a key point of the image;
obtaining a first weight according to the initial position;
obtaining a second weight according to the feature map; and
determining the spatial location weight based on the first weight and the second weight;
Including, an image processing method.

According to claim 1,
Before acquiring the feature map of the image,
determining a first interpolation coefficient based on the first relative distance information and the first image; and
obtaining the image by performing interpolation based on the first interpolation coefficient and pixel points of the first image
including,
wherein the first relative distance information is relative distance information between a projection point on the first image of a pixel point of the image and an adjacent pixel point of the projection point;
Image processing method.

4. The method of claim 3,
The step of determining the first interpolation coefficient comprises:
extracting a feature of the first image;
obtaining a first splicing feature by splicing the feature and the first relative distance information; and
performing convolution based on the first splicing feature to obtain the first interpolation coefficients;
Including, an image processing method.

5. The method of claim 4,
Before determining the first interpolation coefficient based on the first relative distance information and the first image,
cropping the eye region image block including the interferer from the second image to obtain an image not including the eye region;
determining a pupil weight map according to the second image;
obtaining an eye region image block from which an interference has been removed based on the pupil weight map and the eye region image block; and
obtaining the first image or the image by splicing the eye region image block from which the interference has been removed and an image not including the eye region;
Further comprising, an image processing method.

According to claim 1,
determining the reliability of the feature map; and
Determining whether or not tracking is successful according to the reliability
Further comprising, an image processing method.

7. The method of claim 6,
The step of determining the reliability is
Acquiring the reliability of the feature map by performing a convolution operation, a fully concatenated operation, and a softmax operation based on the feature map,
Image processing method.

According to claim 1,
adjusting a 3D display effect of a display interface based on the position of the key point;
Further comprising, an image processing method.

obtaining a candidate box of the feature map of the image;
determining second relative distance information corresponding to relative distance information between a projection point on the feature map in the candidate box of a coordinate point of a first feature map of a preset size and an adjacent coordinate point of the projection point;
determining a second interpolation coefficient based on the second relative distance information and a feature map in the candidate box;
obtaining the first feature map by performing interpolation based on the second interpolation coefficient and the feature map in the candidate box; and
performing processing based on the first feature map
An image processing method comprising a.

10. The method of claim 9,
The step of determining the second interpolation coefficient comprises:
obtaining a second splicing feature according to the feature map in the candidate box and the second relative distance information; and
performing convolution based on the second splicing feature to obtain the second interpolation coefficient;
Including, an image processing method.

A computer program stored in a computer-readable recording medium in combination with hardware to execute the method of any one of claims 1 to 10.

processor; and
memory containing instructions executable by the processor
including,
When the instructions are executed in the processor, the processor
obtain a feature map of the image,
determining a spatial position weight of a pixel point of the feature map;
obtaining a corrected feature map by correcting the feature map based on the spatial position weight of the pixel point;
determining a key point according to the corrected feature map,
image processing unit.

13. The method of claim 12,
the processor is
detecting an initial position of a key point of the image;
obtaining a first weight according to the initial position;
obtaining a second weight according to the feature map;
determining the spatial location weight based on the first weight and the second weight;
image processing unit.

13. The method of claim 12,
the processor is
Before acquiring the feature map of the image,
determine a first interpolation coefficient based on the first relative distance information and the first image,
to obtain the image by performing interpolation based on the first interpolation coefficient and the pixel point of the first image;
The first relative distance information is
relative distance information between a projection point on the first image of a pixel point of the image and an adjacent pixel point of the projection point;
Image processing method.

15. The method of claim 14,
the processor is
extracting the features of the first image,
splicing the feature and the first relative distance information to obtain a first splicing feature,
performing convolution based on the first splicing feature to obtain the first interpolation coefficient;
image processing unit.

16. The method of claim 15,
the processor is
Before determining the first interpolation coefficient based on the first relative distance information and the first image,
Cropping the eye region image block including the interferer from the second image to obtain an image that does not include the eye region;
determining a pupil weight map according to the second image;
obtaining an eye region image block from which an interference has been removed based on the pupil weight map and the eye region image block;
obtaining the first image or the image by splicing the eye region image block from which the interference has been removed and an image not including the eye region;
image processing unit.

13. The method of claim 12,
the processor is
determining the reliability of the feature map,
Determining whether or not tracking is successful according to the reliability,
image processing unit.

18. The method of claim 17,
the processor is
Obtaining the reliability of the feature map by performing a convolution operation, a fully concatenated operation, and a softmax operation based on the feature map,
image processing unit.

13. The method of claim 12,
the processor is
adjusting the 3D display effect of the display interface based on the position of the key point,
image processing unit.