KR20210145271A

KR20210145271A - Motion recognition method and apparatus, electronic device, computer readable storage medium

Info

Publication number: KR20210145271A
Application number: KR1020217036106A
Authority: KR
Inventors: 지안차오 우; 지아퀴 두안; 장후이 쿠앙; 웨이 장
Original assignee: 선전 센스타임 테크놀로지 컴퍼니 리미티드
Priority date: 2020-03-11
Filing date: 2021-02-22
Publication date: 2021-12-01
Also published as: CN111401205A; JP2022529299A; CN111401205B; TW202135002A; WO2021179898A1

Abstract

본 발명의 실시예는 동작 인식 방법 및 장치, 전자 기기, 컴퓨터 판독 가능 저장 매체를 제공한다. 여기서, 본 발명은 타깃 대상에 대응되는 대상 틀을 사용하여 동작 특징 정보를 결정하는 것이고, 전체 프레임 이미지를 사용하여 동작 특징 정보를 결정하는 것은 아니며, 각 프레임 이미지에서 동작 인식을 수행하기 위한 데이터량을 효과적으로 낮출 수 있음으로써, 동작 인식을 수행하기 위한 이미지의 수량을 증가할 수 있어서, 동작 인식의 정확도를 향상시키는데 유리하고; 또한, 본 발명에서는 타깃 대상의 동작 특징 정보를 사용하여 동작 분류 및 인식을 수행할 뿐만 아니라, 비디오 클립 및 결정된 상기 동작 특징 정보를 사용하여, 타깃 대상이 위치한 시나리오의 시나리오 특징 정보 및 타깃 대상의 동작과 연관되는 시계열 특징 정보를 추출하며, 동작 특징 정보의 기초에서, 시나리오 정보 및 시계열 특징 정보를 결합하여 동작 인식의 정확도를 추가로 향상시킬 수 있다.An embodiment of the present invention provides a method and apparatus for recognizing a motion, an electronic device, and a computer-readable storage medium. Here, the present invention determines motion characteristic information using a target frame corresponding to a target target, and does not determine motion characteristic information using an entire frame image, but data amount for performing motion recognition in each frame image. can be effectively lowered, thereby increasing the quantity of images for performing gesture recognition, which is advantageous for improving the accuracy of gesture recognition; In addition, in the present invention, motion classification and recognition are performed using the motion characteristic information of the target object, and scenario characteristic information of a scenario in which the target object is located and the motion of the target object are performed using a video clip and the determined motion characteristic information. Time-series characteristic information associated with , is extracted, and the accuracy of motion recognition can be further improved by combining scenario information and time-series characteristic information based on the motion characteristic information.

Description

Motion recognition method and apparatus, electronic device, computer readable storage medium

관련 출원의 상호 참조Cross-referencing of related applications

본 발명은 출원번호가 202010166148.8이고 출원일자가 2020년 3월 11일인 중국 특허 출원에 기반하여 제출하였고, 상기 중국 특허 출원의 우선권을 주장하는 바, 상기 중국 특허 출원의 전부 내용을 본 발명에 인용하여 참조로 한다.The present invention has been filed based on a Chinese patent application with an application number of 202010166148.8 and an application date of March 11, 2020, and claims priority to the Chinese patent application, all contents of the Chinese patent application are incorporated herein by reference. do it with

본 발명은 컴퓨터 기술, 이미지 처리 분야에 관한 것으로서, 구체적으로, 동작 인식 방법 및 장치, 전자 기기, 컴퓨터 판독 가능 저장 매체에 관한 것이다.The present invention relates to the field of computer technology and image processing, and more particularly, to a method and apparatus for recognizing a motion, an electronic device, and a computer-readable storage medium.

동작 검출 및 인식은 로봇, 안전 및 건강 등 분야에서 광범위하게 응용된다. 관련 기술에서, 동작 인식을 수행할 때, 인식 기기의 데이터 처리 능력에는 한계가 있고, 동작 인식을 수행하기 위한 데이터 타입은 단일한 등 요소로 하여, 동작 인식 정확도가 낮은 결함이 존재한다.Motion detection and recognition has a wide range of applications in robotics, safety and health, and more. In the related art, when performing gesture recognition, there is a limitation in the data processing capability of the recognition device, and the data type for performing gesture recognition is a single equal element, so there is a defect in which gesture recognition accuracy is low.

이를 감안하여, 본 발명은 적어도 동작 인식 방법 및 장치, 전자 기기, 컴퓨터 판독 가능 저장 매체를 제공한다.In view of this, the present invention provides at least a gesture recognition method and apparatus, an electronic device, and a computer-readable storage medium.

제1 측면에 있어서, 본 발명은 동작 인식 방법을 제공하고, 상기 동작 인식 방법은,In a first aspect, the present invention provides a gesture recognition method, the gesture recognition method comprising:

비디오 클립을 획득하는 단계;acquiring a video clip;

타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정하는 단계;determining, by the target object, motion characteristic information of the target object based on an object frame in a key frame image in the video clip;

상기 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정하는 단계; 및determining scenario characteristic information and time series characteristic information corresponding to the target object based on the video clip and the motion characteristic information; and

상기 동작 특징 정보, 상기 시나리오 특징 정보 및 상기 시계열 특징 정보에 기반하여, 상기 타깃 대상의 동작 타입을 결정하는 단계를 포함한다.and determining an operation type of the target object based on the operation characteristic information, the scenario characteristic information, and the time series characteristic information.

본 발명의 실시예에서, 타깃 대상에 대응되는 대상 틀을 사용하여 동작 특징 정보를 결정하는 것이고, 전체 프레임 이미지를 사용하여 동작 특징 정보를 결정하는 것은 아니며, 각 프레임 이미지에서 동작 인식을 수행하기 위한 데이터량을 효과적으로 낮출 수 있음으로써, 동작 인식을 수행하기 위한 이미지의 수량을 증가할 수 있어, 동작 인식의 정확도를 향상시키는데 유리하고; 또한, 본 측면에서 타깃 대상의 동작 특징 정보를 사용하여 동작 분류 및 인식을 수행할 뿐만 아니라, 비디오 클립 및 결정된 상기 동작 특징 정보를 사용하여, 타깃 대상이 위치한 시나리오의 시나리오 특징 정보 및 타깃 대상의 동작과 연관되는 시계열 특징 정보를 추출하며, 동작 특징 정보의 기초에서, 시나리오 정보 및 시계열 특징 정보를 결합하여 동작 인식의 정확도를 추가로 향상시킬 수 있다.In an embodiment of the present invention, motion characteristic information is determined by using a target frame corresponding to a target target, and motion characteristic information is not determined using an entire frame image, but for performing motion recognition in each frame image. By being able to effectively lower the amount of data, it is possible to increase the quantity of images for performing motion recognition, which is advantageous for improving the accuracy of motion recognition; In addition, in this aspect, motion classification and recognition are performed using the motion characteristic information of the target object, as well as the scenario characteristic information of the scenario in which the target object is located and the motion of the target object using the video clip and the determined motion characteristic information. Time-series characteristic information associated with , is extracted, and the accuracy of motion recognition can be further improved by combining scenario information and time-series characteristic information based on the motion characteristic information.

가능한 실시 형태에 있어서, 상기 동작 인식 방법은 키 프레임 이미지에서의 대상 틀을 결정하는 단계를 더 포함하고, 상기 단계는,In a possible embodiment, the gesture recognition method further comprises the step of determining a target frame in a key frame image, said step comprising:

상기 비디오 클립에서 키 프레임 이미지를 선별하는 단계;selecting a key frame image from the video clip;

선별하여 얻은 상기 키 프레임 이미지에 대해 대상 검출을 수행하여, 상기 타깃 대상이 상기 키 프레임 이미지에서의 초기 대상 경계 박스를 결정하는 단계; 및performing object detection on the key frame image obtained by screening, so that the target object determines an initial object bounding box in the key frame image; and

기설정 확장 사이즈 정보에 따라, 상기 초기 대상 경계 박스에 대해 확장을 수행하여, 상기 타깃 대상이 상기 키 프레임 이미지에서의 상기 대상 틀을 얻는 단계를 포함한다.and performing expansion on the initial target bounding box according to preset expansion size information, so that the target target obtains the target frame in the key frame image.

본 발명의 실시 형태에서, 대상 검출의 방법을 사용하여 타깃 대상이 이미지에서의 틀을 결정하여, 동작 인식을 수행하여 처리가 필요한 데이터량을 줄였고, 비교적 작은 초기 대상 경계 박스를 결정한 다음, 이에 대해 확장을 수행함으로써, 동작 인식을 수행하기 위한 대상 틀로 하여금 더욱 완전한 타깃 대상의 정보 및 더욱 많은 환경 정보를 포함할 수 있도록 하여, 더욱 많은 공간 디테일을 보유함으로써, 동작 인식의 정확도를 향상시키는데 유리하다.In the embodiment of the present invention, using the method of object detection, the target object determines the frame in the image, performs motion recognition to reduce the amount of data that needs processing, determines a relatively small initial object bounding box, and then for By performing the extension, the target frame for performing motion recognition can include more complete target target information and more environment information, thereby retaining more spatial detail, which is advantageous for improving the accuracy of motion recognition.

가능한 실시 형태에 있어서, 상기 타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정하는 단계는, In a possible embodiment, the step of determining, by the target object, the motion characteristic information of the target object based on an object frame in a key frame image in the video clip, comprises:

상기 키 프레임 이미지를 대상으로 하여, 상기 비디오 클립에서 상기 키 프레임 이미지와 대응되는 복수 개 연관 이미지를 선별하는 단계;selecting a plurality of related images corresponding to the key frame image from the video clip by using the key frame image;

상기 키 프레임 이미지에 대응되는 대상 틀에 따라, 상기 키 프레임 이미지에 대응되는 적어도 부분적 연관 이미지에서 부분 이미지를 각각 절취하여, 상기 키 프레임 이미지에 대응되는 복수 개 타깃 대상 이미지를 얻는 단계; 및obtaining a plurality of target target images corresponding to the key frame images by cutting out partial images from at least partially related images corresponding to the key frame images according to the target frame corresponding to the key frame image; and

상기 키 프레임 이미지에 대응되는 복수 개 타깃 대상 이미지에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정하는 단계를 포함한다.and determining motion characteristic information of the target target based on a plurality of target target images corresponding to the key frame image.

본 발명의 실시 형태에서, 타깃 대상이 키 프레임 이미지에서의 대상 틀을 사용하여 포지셔닝을 수행하고, 키 프레임 이미지와 서로 연관된 복수 개 연관 이미지에서 동작 특징 정보를 결정하기 위한 타깃 대상 이미지를 절취하여, 동작 특징 정보 결정에 사용되는 이미지의 정밀도를 향상시켰으며, 동작 특징 정보를 결정하기 위한 이미지의 수량을 증가할 수 있음으로써, 동작 인식의 정확도를 향상시킬 수 있다.In an embodiment of the present invention, the target object performs positioning by using the object frame in the key frame image, and cuts the target object image for determining motion characteristic information from the key frame image and a plurality of associative images associated with each other, The precision of an image used for determining the motion characteristic information is improved, and the number of images for determining the motion characteristic information can be increased, thereby improving the accuracy of motion recognition.

가능한 실시 형태에 있어서, 상기 비디오 클립에서 키 프레임 이미지와 대응되는 복수 개 연관 이미지를 선별하는 단계는,In a possible embodiment, the step of selecting a plurality of associated images corresponding to a key frame image in the video clip comprises:

상기 비디오 클립에서 키 프레임 이미지를 포함하는 제1 서브 비디오 클립을 선택하는 단계 - 상기 제1 서브 비디오 클립은 상기 키 프레임 이미지와 시계열에서 인접하는 N 개 이미지를 더 포함하며, N은 양의 정수임 - ; 및selecting from the video clip a first sub-video clip comprising a key frame image, wherein the first sub video clip further comprises N images contiguous in time series with the key frame image, where N is a positive integer; ; and

상기 제1 서브 비디오 클립에서 상기 복수 개 연관 이미지를 선별하는 단계를 포함한다.and selecting the plurality of related images from the first sub-video clip.

본 발명의 실시 형태에서, 키 프레임 이미지의 촬영 시간과 비슷한 서브 비디오 클립에서 키 프레임 이미지와 서로 연관된 이미지를 선별하면, 키 프레임 이미지와 연관 정도가 가장 가까운 이미지를 선별할 수 있고, 키 프레임 이미지와 연관 정도가 가장 가까운 이미지에 기반하여, 결정된 동작 특징 정보의 정확도를 향상시킬 수 있다.In an embodiment of the present invention, if an image related to a key frame image is selected in a sub video clip similar to the shooting time of the key frame image, an image with the closest degree of association to the key frame image can be selected, and the key frame image and the key frame image are selected. Based on the image having the closest degree of association, the accuracy of the determined motion characteristic information may be improved.

가능한 실시 형태에 있어서, 복수 개 타깃 대상 이미지를 얻은 다음, 상기 타깃 대상의 동작 특징 정보를 결정하기 전에, In a possible embodiment, after obtaining a plurality of target object images and prior to determining motion characteristic information of the target object,

상기 타깃 대상 이미지를 기설정 이미지 해상도를 구비한 이미지로 설정한다.The target image is set as an image having a preset image resolution.

본 발명의 실시 형태에서, 타깃 대상 이미지를 절취한 다음, 타깃 대상 이미지를 기설정된 해상도로 설정하여, 타깃 대상 이미지에 포함되는 정보의 수량을 향상시킬 수 있고, 즉 절취된 타깃 대상 이미지를 확대할 수 있어서, 타깃 대상의 파인 그레인드 디테일을 획득하는데 유리함으로써, 결정된 동작 특징 정보의 정확도를 향상시킬 수 있다.In an embodiment of the present invention, after the target target image is cut out, the target target image is set to a preset resolution, so that the quantity of information included in the target target image can be improved, that is, the cropped target target image can be enlarged. Therefore, it is advantageous to obtain fine grained detail of the target object, thereby improving the accuracy of the determined motion characteristic information.

가능한 실시 형태에 있어서, 상기 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정하는 단계는,In a possible embodiment, the determining of scenario characteristic information and time series characteristic information corresponding to the target target based on the video clip and the motion characteristic information includes:

적어도 부분적 상기 연관 이미지에 대해 비디오 시나리오 특징 추출 동작을 수행하여, 상기 시나리오 특징 정보를 얻는 단계;at least partially performing a video scenario feature extraction operation on the associated image to obtain the scenario feature information;

상기 비디오 클립에서의 타깃 대상을 제외한 다른 대상에 대해 시계열 특징 추출 동작을 수행하여, 초기 시계열 특징 정보를 얻는 단계; 및obtaining initial time-series feature information by performing a time-series feature extraction operation on an object other than the target object in the video clip; and

상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 결정하는 단계를 포함한다.and determining time-series characteristic information corresponding to the target target based on the initial time-series characteristic information and the operation characteristic information.

본 발명의 실시 형태에서, 키 프레임 이미지와 서로 연관된 연관 이미지에서 시나리오 특징을 추출하여, 비교적 완전한 시나리오 특징 정보를 얻을 수 있고, 비교적 완전한 시나리오 특징 정보에 기반하여 동작 인식의 정확도를 향상시킬 수 있으며; 또한, 본 발명의 실시 형태에서 타깃 대상을 제외한 다른 대상의 시계열 특징 즉 상기 초기 시계열 특징 정보를 추출하고, 다른 대상의 시계열 특징 및 타깃 대상의 동작 특징 정보에 기반하여, 타깃 대상과 서로 연관된 시계열 특징 정보를 결정하였으며, 상기 타깃 대상과 서로 연관된 시계열 특징 정보를 사용하여, 동작 인식의 정확도를 추가로 향상시킬 수 있다.In an embodiment of the present invention, by extracting the scenario features from the key frame image and the correlated image, it is possible to obtain relatively complete scenario feature information, and improve the accuracy of motion recognition based on the relatively complete scenario feature information; In addition, in an embodiment of the present invention, time series characteristics of other objects except for the target object, that is, the initial time series characteristic information are extracted, and based on the time series characteristics of other objects and operation characteristic information of the target object, time series characteristics associated with the target object are extracted. After determining the information, the accuracy of motion recognition may be further improved by using time-series characteristic information related to the target object.

가능한 실시 형태에 있어서, 상기 비디오 클립에서의 타깃 대상을 제외한 다른 대상에 대해 시계열 특징 추출 동작을 수행하여, 초기 시계열 특징 정보를 얻는 단계는,In a possible embodiment, the step of obtaining initial time-series feature information by performing a time-series feature extraction operation on an object other than the target object in the video clip includes:

상기 키 프레임 이미지를 대상으로 하여, 상기 비디오 클립에서 키 프레임 이미지를 포함하는 제2 서브 비디오 클립을 선택하는 단계 - 상기 제2 서브 비디오 클립은 상기 키 프레임 이미지와 시계열에서 인접하는 P 개 이미지를 더 포함하며, P는 양의 정수임 - ; 및selecting a second sub-video clip including a key frame image from the video clip with respect to the key frame image, wherein the second sub video clip further includes P images adjacent to the key frame image in time series wherein P is a positive integer - ; and

상기 제2 서브 비디오 클립에서의 이미지에서, 상기 타깃 대상을 제외한 다른 대상의 동작 특징을 추출하고, 얻은 동작 특징을 상기 초기 시계열 특징 정보로 사용하는 단계를 포함한다.and extracting motion characteristics of objects other than the target object from the image in the second sub-video clip, and using the obtained motion characteristics as the initial time series characteristic information.

본 발명의 실시 형태에서, 비디오 클립에서 키 프레임 이미지의 촬영 시간과 비교적 비슷한 서브 비디오 클립을 선택하여 시계열 특징을 추출하여, 추출하여 얻은 시계열 특징의 데이터량을 감소할 수 있고, 결정된 시계열 특징과 키 프레임 이미지의 연관성을 향상시킬 수 있음으로써, 동작 인식의 정확도를 향상시키는데 유리하며; 또한, 본 발명의 실시 형태에서, 다른 대상의 동작 특징을 시계열 특징으로 사용하여, 동작 인식에 사용되는 순차성 특징의 타깃성을 향상시킬 수 있음으로써, 동작 인식의 정확도를 향상시키는데 유리하다.In an embodiment of the present invention, it is possible to select a sub video clip relatively similar to the shooting time of the key frame image from the video clip to extract the time series feature, thereby reducing the data amount of the extracted time series feature, and the determined time series feature and the key By being able to improve the association of frame images, it is advantageous to improve the accuracy of motion recognition; Further, in the embodiment of the present invention, it is possible to improve the targetability of the sequential feature used for motion recognition by using the motion characteristic of another object as a time series characteristic, which is advantageous for improving the accuracy of motion recognition.

가능한 실시 형태에 있어서, 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 결정하는 단계는,In a possible embodiment, the determining of time-series characteristic information corresponding to the target target based on the initial time-series characteristic information and the operation characteristic information comprises:

상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 대해 각각 차원 축소 처리를 수행하는 단계;performing dimension reduction processing on the initial time series characteristic information and the operation characteristic information, respectively;

차원 축소 처리된 초기 시계열 특징 정보에 대해 평균값 풀링 동작을 수행하는 단계; 및performing an average pooling operation on the initial time series feature information subjected to dimension reduction; and

평균값 풀링 동작을 수행한 초기 시계열 특징 정보 및 차원 축소 처리된 동작 특징 정보에 대해 병합 동작을 수행하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 얻는 단계를 포함한다.and performing a merging operation on the initial time-series characteristic information on which the average pooling operation has been performed and the dimensionally-reduced operation characteristic information to obtain time-series characteristic information corresponding to the target object.

본 발명의 실시 형태에서, 초기 시계열 특징 정보 및 동작 특징 정보에 기반하여, 시계열 특징 정보를 결정할 때, 초기 시계열 특징 정보 및 동작 특징 정보에 대해 차원 축소 처리를 수행하여, 처리가 필요한 데이터량을 줄일 수 있음으로써, 동작 인식의 효율을 향상시키는데 유리하고; 또한, 본 발명의 실시 형태는 차원 축소된 초기 시계열 특징 정보에 대해 평균값 풀링 동작을 수행하여, 시계열 특징 추출의 동작 단계를 간소화함으로써, 동작 인식의 효율을 향상시킬 수 있다.In an embodiment of the present invention, when determining the time series characteristic information based on the initial time series characteristic information and the operation characteristic information, dimensionality reduction processing is performed on the initial time series characteristic information and the operation characteristic information to reduce the amount of data required for processing By being able to, it is advantageous to improve the efficiency of motion recognition; In addition, the embodiment of the present invention may improve the efficiency of motion recognition by performing an average pooling operation on the dimension-reduced initial time series feature information to simplify the operation step of time series feature extraction.

얻은 상기 타깃 대상에 대응되는 시계열 특징 정보를 새로운 초기 시계열 특징 정보로 사용하고, 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 대해 각각 차원 축소 처리를 수행하는 단계로 리턴하는 단계를 더 포함한다.The method further includes using the obtained time-series characteristic information corresponding to the target object as new initial time-series characteristic information, and returning to the step of performing dimension reduction processing on the initial time-series characteristic information and the operation characteristic information, respectively.

본 발명의 실시 형태에서, 초기 시계열 특징 정보 및 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 결정하는 시계열 특징 추출 동작을 중복 실행하여, 결정된 시계열 특징 정보의 정확도를 향상시킬 수 있다.In an embodiment of the present invention, based on the initial time-series characteristic information and the operation characteristic information, the time-series characteristic extraction operation for determining the time-series characteristic information corresponding to the target object is repeatedly executed, thereby improving the accuracy of the determined time-series characteristic information have.

제2 측면에 있어서, 본 발명은 동작 인식 방법을 제공하고, 상기 동작 인식 장치는,In a second aspect, the present invention provides a gesture recognition method, the gesture recognition apparatus comprising:

비디오 클립을 획득하도록 구성된 비디오 획득 모듈;a video acquisition module configured to acquire a video clip;

타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정하도록 구성된 동작 특징 결정 모듈;a motion characteristic determining module, wherein the target object is configured to determine motion characteristic information of the target object based on an object frame in a key frame image in the video clip;

상기 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정하도록 구성된 시나리오 시계열 특징 결정 모듈; 및a scenario time-series characteristic determining module, configured to determine, based on the video clip and the motion characteristic information, scenario characteristic information and time-series characteristic information corresponding to the target object; and

상기 동작 특징 정보, 상기 시나리오 특징 정보 및 상기 시계열 특징 정보에 기반하여, 상기 타깃 대상의 동작 타입을 결정하도록 구성된 동작 인식 모듈을 포함한다.and a motion recognition module configured to determine an operation type of the target object based on the motion characteristic information, the scenario characteristic information, and the time series characteristic information.

가능한 실시 형태에 있어서, 상기 동작 특징 결정 모듈은 또한 키 프레임 이미지에서의 대상 틀을 결정하고,In a possible embodiment, the motion characteristic determination module further determines a target frame in the key frame image,

상기 비디오 클립에서 키 프레임 이미지를 선별하며;select a key frame image from the video clip;

기설정 확장 사이즈 정보에 따라, 상기 초기 대상 경계 박스에 대해 확장을 수행하여, 상기 타깃 대상이 상기 키 프레임 이미지에서의 상기 대상 틀을 얻도록 구성된다.and perform expansion on the initial object bounding box according to preset extension size information, so that the target object obtains the object frame in the key frame image.

가능한 실시 형태에 있어서, 상기 동작 특징 결정 모듈은 타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정할 때,In a possible embodiment, the motion characteristic determining module is configured to: when the target object determines motion characteristic information of the target object based on a target frame in a key frame image in the video clip,

상기 키 프레임 이미지에 대응되는 대상 틀에 따라, 상기 키 프레임 이미지에 대응되는 적어도 부분적 연관 이미지에서 부분 이미지를 각각 절취하여, 상기 키 프레임 이미지에 대응되는 복수 개 타깃 대상 이미지를 얻으며;cutting out partial images from at least partially related images corresponding to the key frame images according to the target frame corresponding to the key frame image, respectively, to obtain a plurality of target target images corresponding to the key frame image;

가능한 실시 형태에 있어서, 상기 동작 특징 결정 모듈은 상기 비디오 클립에서 키 프레임 이미지와 대응되는 복수 개 연관 이미지를 선별할 때, In a possible embodiment, when the motion characteristic determining module selects a plurality of related images corresponding to a key frame image in the video clip,

상기 비디오 클립에서 키 프레임 이미지를 포함하는 제1 서브 비디오 클립을 선택하고 - 상기 제1 서브 비디오 클립은 상기 키 프레임 이미지와 시계열에서 인접하는 N 개 이미지를 더 포함하며, N은 양의 정수임 - ; select from the video clip a first sub-video clip comprising a key frame image, wherein the first sub video clip further comprises N images contiguous in time series with the key frame image, where N is a positive integer;

상기 제1 서브 비디오 클립에서 상기 복수 개 연관 이미지를 선별하도록 구성된다.and select the plurality of associated images from the first sub-video clip.

가능한 실시 형태에 있어서, 복수 개 타깃 대상 이미지를 얻은 다음, 상기 타깃 대상의 동작 특징 정보를 결정하기 전에, 상기 동작 특징 결정 모듈은 또한,In a possible embodiment, after obtaining a plurality of target object images and before determining the operation characteristic information of the target object, the operation characteristic determining module further comprises:

상기 타깃 대상 이미지를 기설정 이미지 해상도를 구비한 이미지로 설정하도록 구성된다.and set the target target image to an image having a preset image resolution.

가능한 실시 형태에 있어서, 상기 시나리오 시계열 특징 결정 모듈은 상기 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정할 때,In a possible embodiment, when the scenario time series feature determination module determines the scenario feature information and time series feature information corresponding to the target object based on the video clip and the motion feature information,

적어도 부분적 상기 연관 이미지에 대해 비디오 시나리오 특징 추출 동작을 수행하여, 상기 시나리오 특징 정보를 얻으며;at least partially performing a video scenario feature extraction operation on the associated image to obtain the scenario feature information;

상기 비디오 클립에서의 타깃 대상을 제외한 다른 대상에 대해 시계열 특징 추출 동작을 수행하여, 초기 시계열 특징 정보를 얻고; performing a time-series feature extraction operation on an object other than the target object in the video clip to obtain initial time-series feature information;

가능한 실시 형태에 있어서, 상기 시나리오 시계열 특징 결정 모듈은 상기 비디오 클립에서의 타깃 대상을 제외한 다른 대상에 대해 시계열 특징 추출 동작을 수행하여, 초기 시계열 특징 정보를 얻을 때,In a possible embodiment, the scenario time series feature determination module performs a time series feature extraction operation on objects other than the target object in the video clip to obtain initial time series feature information,

상기 제2 서브 비디오 클립에서의 이미지에서, 상기 타깃 대상을 제외한 다른 대상의 동작 특징을 추출하고, 얻은 동작 특징을 상기 초기 시계열 특징 정보로 사용하도록 구성된다.and extracting motion characteristics of objects other than the target object from the image in the second sub-video clip, and using the obtained motion characteristics as the initial time-series characteristic information.

가능한 실시 형태에 있어서, 상기 시나리오 시계열 특징 결정 모듈은 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 결정할 때, In a possible embodiment, when the scenario time series characteristic determination module determines time series characteristic information corresponding to the target target based on the initial time series characteristic information and the operation characteristic information,

상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 대해 각각 차원 축소 처리를 수행하고;performing dimension reduction processing on the initial time series characteristic information and the operation characteristic information, respectively;

차원 축소 처리된 초기 시계열 특징 정보에 대해 평균값 풀링 동작을 수행하며; performing an average pooling operation on the initial time series feature information that has been dimensionally reduced;

평균값 풀링 동작을 수행한 초기 시계열 특징 정보 및 차원 축소 처리된 동작 특징 정보에 대해 병합 동작을 수행하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 얻도록 구성된다.and a merging operation is performed on the initial time-series feature information on which the average pooling operation has been performed and the dimension-reduced operation feature information to obtain time-series feature information corresponding to the target object.

가능한 실시 형태에 있어서, 상기 시나리오 시계열 특징 결정 모듈은 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 결정할 때 또한, In a possible embodiment, when the scenario time series characteristic determination module determines the time series characteristic information corresponding to the target target based on the initial time series characteristic information and the operation characteristic information,

얻은 상기 타깃 대상에 대응되는 시계열 특징 정보를 새로운 초기 시계열 특징 정보로 사용하고, 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 대해 각각 차원 축소 처리를 수행하는 단계로 리턴하도록 구성된다.and use the obtained time-series characteristic information corresponding to the target object as new initial time-series characteristic information, and return to the step of performing dimension reduction processing on the initial time-series characteristic information and the operation characteristic information, respectively.

제3 측면에 있어서, 본 발명은 전자 기기를 제공하고, 상기 전자 기기는, 서로 연결된 프로세서 및 저장 매체를 포함하며, 상기 저장 매체에는 상기 프로세서가 실행 가능한 기기 판독 가능 명령어가 저장되고, 전자 기기가 작동될 때, 상기 프로세서는 상기 기기 판독 가능 명령어를 실행함으로써, 상기 동작 인식 방법의 단계를 실행한다.In a third aspect, the present invention provides an electronic device, wherein the electronic device includes a processor and a storage medium connected to each other, wherein the storage medium stores machine-readable instructions executable by the processor, the electronic device When activated, the processor executes the steps of the method for recognizing the motion by executing the machine readable instructions.

제4 측면에 있어서, 본 발명은 컴퓨터 판독 가능 저장 매체를 더 제공하고, 상기 컴퓨터 판독 가능 저장 매체에는 컴퓨터 프로그램이 저장되어 있으며, 상기 컴퓨터 프로그램이 프로세서에 의해 작동될 때 상기 동작 인식 방법과 같은 단계를 실행한다.In a fourth aspect, the present invention further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the same steps as the operation recognition method when the computer program is operated by a processor run

제5 측면에 있어서, 본 발명은 컴퓨터 판독 가능 코드를 포함하는 컴퓨터 프로그램을 더 제공하고, 상기 컴퓨터 판독 가능 코드가 전자 기기에서 작동될 때, 상기 전자 기기에서의 프로세서는 상기 동작 인식 방법의 단계를 구현한다.In a fifth aspect, the present invention further provides a computer program comprising computer readable code, wherein when the computer readable code is operated in an electronic device, a processor in the electronic device performs the steps of the method for recognizing the motion. implement

본 발명의 상기 장치, 전자 기기 및 컴퓨터 판독 가능 저장 매체는, 적어도 본 발명의 상기 방법의 어느 한 측면 또는 어느 한 측면의 어느 한 실시 형태의 기술 특징과 실질적으로 동일하거나 유사한 기술 특징을 포함하기에, 따라서 상기 장치, 전자 기기 및 컴퓨터 판독 가능 저장 매체에 관한 효과 설명은, 상기 방법 내용의 효과 설명을 참조할 수 있고, 여기서 더 이상 반복하지 않는다.The device, the electronic device and the computer-readable storage medium of the present invention include at least the technical features substantially the same as or similar to the technical features of any one embodiment of the method or any one aspect of the present invention. , therefore, for the description of the effect of the device, the electronic device and the computer-readable storage medium, reference may be made to the description of the effect of the method content, which is not repeated herein any further.

본 발명 실시예의 기술 방안을 더욱 명확하게 설명하기 위해, 아래에 실시예에서 사용하게 될 도면에 대해 간단히 설명하고, 이해해야 할 것은, 아래의 도면은 다만 본 발명의 일부 실시예를 도시하였을 뿐이기에, 범위에 대한 한정으로 간주되어서는 안되며, 본 분야의 통상적 기술자는, 창조성 노동을 부여하지 않는 전제하에서도, 이러한 도면에 따라 다른 관련된 도면을 획득할 수 있다.
도 1a는 본 발명의 실시예에서 제공하는 동작 인식 방법의 흐름도를 도시하였다.
도 1b는 본 발명의 실시예에서 제공하는 네트워크 아키텍처 예시도를 도시하였다.
도 2는 본 발명의 실시예에서 제공하는 다른 동작 인식 방법에서 타깃 대상의 동작 특징 정보를 결정하는 흐름도를 도시하였다.
도 3은 본 발명의 실시예에서 제공하는 또 다른 동작 인식 방법에서 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정하는 흐름도를 도시하였다.
도 4는 본 발명의 실시예에서의 간소화된 시계열 특징 추출 모듈의 구조 예시도를 도시하였다.
도 5는 본 발명의 실시예에서 제공하는 또 다른 동작 인식 방법를 도시하였다.
도 6은 본 발명의 실시예에서 제공하는 다른 동작 인식 장치의 구조 예시도를 도시하였다.
도 7은 본 발명의 실시예에서 제공하는 다른 전자 기기의 구조 예시도를 도시하였다.In order to more clearly explain the technical solutions of the embodiments of the present invention, the drawings that will be used in the embodiments below will be briefly described and understood, since the drawings below only show some embodiments of the present invention, It should not be construed as a limitation in scope, and a person skilled in the art may obtain other related drawings according to these drawings, even on the premise that creative labor is not imparted.
1A is a flowchart illustrating a gesture recognition method provided in an embodiment of the present invention.
1B shows an exemplary diagram of a network architecture provided in an embodiment of the present invention.
2 is a flowchart of determining motion characteristic information of a target object in another motion recognition method provided in an embodiment of the present invention.
FIG. 3 is a flowchart of determining scenario characteristic information and time series characteristic information corresponding to the target object in another motion recognition method provided in an embodiment of the present invention.
4 is a diagram illustrating the structure of a simplified time series feature extraction module in an embodiment of the present invention.
5 illustrates another method for recognizing a motion provided in an embodiment of the present invention.
6 is a structural exemplary diagram of another gesture recognition apparatus provided in an embodiment of the present invention.
7 is a schematic structural diagram of another electronic device provided in an embodiment of the present invention.

본 발명의 실시예의 목적, 기술 방안 및 장점을 더욱 명확하게 하기 위해, 아래에 본 발명의 실시예에서의 도면을 결합하여, 본 발명의 실시예에서의 기술 방안에 대해 명확하고 완전한 설명을 수행하고, 이해해야 할 것은, 본 발명에서 도면은 단지 설명 및 묘사의 목적으로만 사용되며, 본 발명의 보호 범위를 한정하기 위한 것은 아니다. 또한, 이해해야 할 것은, 예시적인 도면은 실물의 비례에 따라 그려지지 않았다. 본 발명에서 사용되는 흐름도는 본 발명의 일부 실시예에 따라 구현되는 동작을 도시하였다. 이해해야 할 것은, 흐름도의 동작 순서에 따라 구현되지 않아도 되고, 논리적 컨텍스트 관계가 없는 단계는 순서를 반대로 하거나 또는 동시에 실시할 수 있다. 이 밖에, 본 분야의 기술자는 본 발명 내용의 안내하에, 흐름도에 하나 또는 복수 개 다른 동작을 추가할 수 있고, 흐름도에서 하나 또는 복수 개 동작을 제거할 수도 있다.In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the drawings in the embodiments of the present invention are combined below to give a clear and complete description of the technical solutions in the embodiments of the present invention, It should be understood that, in the present invention, the drawings are used only for the purpose of description and description, and are not intended to limit the protection scope of the present invention. Also, it should be understood that the exemplary drawings are not drawn to scale. The flowchart used in the present invention illustrates the operations implemented in accordance with some embodiments of the present invention. It should be understood that the sequence of operations in the flowchart need not be implemented, and steps without a logical contextual relationship may be performed in the reverse order or concurrently. In addition, a person skilled in the art may add one or a plurality of other operations to the flowchart, and may remove one or more operations from the flowchart under the guidance of the present invention.

또한, 설명된 실시예는 단지 본 발명의 일부 실시예일 뿐이고, 모든 실시예가 아니다. 통상적으로 여기 도면에서 설명 및 도시된 본 발명의 실시예의 컴포넌트는 다양하고 상이한 구성으로 배치 및 설계된다. 따라서, 아래에 도면에서 제공된 본 발명의 실시예에 대한 상세한 설명은 보호 요청된 본 발명의 범위를 한정하려는 것은 아니고, 다만 본 발명의 선정된 실시예를 나타내는 것이다. 본 발명의 실시예에 기반하여, 본 분야의 기술자가 창조성 노동을 부여하지 않은 전제하에서 획득한 모든 다른 실시예는 전부 본 발명의 보호 범위에 속한다.In addition, the described embodiments are merely some embodiments of the present invention, not all embodiments. Typically, the components of the embodiments of the invention described and illustrated in the drawings herein are arranged and designed in a variety of different configurations. Accordingly, the detailed description of the embodiments of the present invention provided in the drawings below is not intended to limit the scope of the present invention for which protection is requested, but merely represents selected embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art under the premise that creative labor is not given fall within the protection scope of the present invention.

설명해야 할 것은, 본 발명의 실시예에서는 용어 “포함”을 사용하게 되고, 나중에 성명된 특징의 존재를 나타내기 위한 것이지만, 다른 특징을 증가하는 것을 배제하지 않는다.It should be noted that in the embodiments of the present invention, the term “comprising” is used, and is intended to indicate the existence of a later stated feature, but does not exclude incrementing other features.

현재 동작 인식에서 존재하는 인식 정밀도가 낮은 기술 문제를 대상으로 하여, 본 발명은 동작 인식 방법 및 장치, 전자 기기, 컴퓨터 판독 가능 저장 매체를 제공한다. 여기서, 본 발명은 타깃 대상에 대응되는 대상 틀을 사용하여 동작 특징 정보를 결정하는 것이고, 전체 프레임 이미지를 사용하여 동작 특징 정보를 결정하는 것은 아니며, 각 프레임 이미지에서 동작 인식을 수행하기 위한 데이터량을 효과적으로 낮출 수 있음으로써, 동작 인식을 수행하기 위한 이미지의 수량을 증가할 수 있어서, 동작 인식의 정확도를 향상시키는데 유리하고; 또한, 본 발명에서는 타깃 대상의 동작 특징 정보를 사용하여 동작 분류 및 인식을 수행할 뿐만 아니라, 비디오 클립 및 결정된 상기 동작 특징 정보를 사용하여, 타깃 대상이 위치한 시나리오의 시나리오 특징 정보 및 타깃 대상의 동작과 연관되는 시계열 특징 정보를 추출하며, 동작 특징 정보의 기초에서, 시나리오 정보 및 시계열 특징 정보를 결합하여 동작 인식의 정확도를 추가로 향상시킬 수 있다.The present invention provides a gesture recognition method and apparatus, an electronic device, and a computer-readable storage medium for a technical problem of low recognition precision existing in gesture recognition. Here, the present invention determines motion characteristic information using a target frame corresponding to a target target, and does not determine motion characteristic information using an entire frame image, but data amount for performing motion recognition in each frame image. can be effectively lowered, thereby increasing the quantity of images for performing gesture recognition, which is advantageous for improving the accuracy of gesture recognition; In addition, in the present invention, motion classification and recognition are performed using the motion characteristic information of the target object, and scenario characteristic information of a scenario in which the target object is located and the motion of the target object are performed using a video clip and the determined motion characteristic information. Time-series characteristic information associated with , is extracted, and the accuracy of motion recognition can be further improved by combining scenario information and time-series characteristic information based on the motion characteristic information.

아래에 구체적인 실시예를 통해 본 발명의 동작 인식 방법 및 장치, 전자 기기, 컴퓨터 판독 가능 저장 매체에 대해 설명한다.Hereinafter, a method and apparatus for recognizing a motion of the present invention, an electronic device, and a computer-readable storage medium will be described through specific embodiments.

본 발명의 실시예는 동작 인식 방법을 제공하고, 상기 동작 인식 방법은 동작 인식을 수행하는 단말 기기 등 하드웨어 기기에 응용되며, 상기 동작 인식 방법은 프로세서가 컴퓨터 프로그램을 실행하는 것을 통해 구현되는 것일 수도 있다. 구체적으로, 도 1a에 도시된 바와 같이, 본 발명의 실시예에서 제공하는 동작 인식 방법은 하기와 같은 단계를 포함한다.An embodiment of the present invention provides a method for recognizing a motion, the method for recognizing a motion is applied to a hardware device such as a terminal device for performing motion recognition, and the method for recognizing a motion may be implemented by a processor executing a computer program. have. Specifically, as shown in FIG. 1A , the gesture recognition method provided in the embodiment of the present invention includes the following steps.

단계 S110에 있어서, 비디오 클립을 획득한다.In step S110, a video clip is acquired.

여기서, 비디오 클립은 동작 인식을 수행하기 위한 비디오 클립이고, 복수 개 이미지를 포함하며, 이미지에는 동작 인식 수행이 필요한 타깃 대상이 포함되고, 상기 타깃 대상은 사람 또는 동물 등일 수 있다.Here, the video clip is a video clip for performing motion recognition, and includes a plurality of images, the image includes a target object requiring motion recognition, and the target object may be a person or an animal.

상기 비디오 클립은 동작 인식을 수행하는 단말 기기가 자체의 카메라 등 촬영 기기를 사용하여 촬영한 것일 수 있고, 다른 촬영 기기에 의해 촬영된 것일 수도 있으며, 다른 촬영 기기에 의해 촬영된 후, 비디오 클립을 동작 인식을 수행하는 단말 기기로 전송하면 된다.The video clip may be photographed by the terminal device performing motion recognition using its own camera, such as a recording device, may be photographed by another photographing device, or may be captured by another photographing device and then the video clip It may be transmitted to a terminal device that performs motion recognition.

단계 S120에 있어서, 타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정한다.In step S120, the target object determines motion characteristic information of the target object based on the object frame in the key frame image in the video clip.

여기서, 대상 틀은 즉 타깃 대상을 둘러싸는 경계 박스이고, 경계 박스 내의 이미지 정보를 사용하여 타깃 대상의 동작 특징 정보를 결정할 때, 단말 기기에 의해 처리되는 데이터량을 낮출 수 있다.Here, the target frame is a bounding box surrounding the target target, and when determining the motion characteristic information of the target target using image information in the bounding box, the amount of data processed by the terminal device may be reduced.

대상 틀에 기반하여 동작 특징 정보를 결정하기 전에, 먼저 비디오 클립에서 키 프레임 이미지를 선별하고, 타깃 대상이 각 키 프레임 이미지에서의 대상 틀을 결정해야 한다.Before determining the motion characteristic information based on the target frame, key frame images in a video clip should be selected first, and the target target should determine the target frame in each key frame image.

구체적으로 실시할 때, 기설정된 시간 간격을 사용하여 비디오 클립에서 키 프레임 이미지를 선별할 수 있고, 물론 다른 방법을 사용하여 비디오 클립에서 키 프레임 이미지를 선별할 수도 있으며, 예를 들어 비디오 클립을 복수 개 서브 클립으로 나눈 후 각 서브 클립에서 한 프레임 이미지를 추출하여 키 프레임 이미지로 사용한다. 본 발명은 비디오 클립에서 키 프레임 이미지를 선별하는 방법에 대해 한정하지 않는다.In a specific implementation, a predetermined time interval may be used to select a key frame image from a video clip, and of course, other methods may be used to select a key frame image from a video clip, for example, a plurality of video clips After dividing into two sub clips, one frame image is extracted from each sub clip and used as a key frame image. The present invention is not limited to the method of selecting key frame images in a video clip.

비디오 클립에서 복수 개 키 프레임 이미지를 선별하여 얻은 다음, 각 키 프레임 이미지에서의 대상 틀을 사용하여 타깃 대상의 동작 특징 정보를 결정할 수 있고, 물론 선별하여 얻은 복수 개 키 프레임 이미지에서의 부분 키 프레임 이미지에서의 대상 틀을 사용하여 타깃 대상의 동작 특징 정보를 결정할 수도 있다. 부분 키 프레임 이미지에서의 대상 틀을 사용하여 타깃 대상의 동작 특징 정보를 결정할 때, 부분 키 프레임 이미지에서의 대상 틀을 추출하거나 결정한 다음, 다시 추출 또는 결정된 틀을 사용하여 타깃 대상의 동작 특징 정보를 결정해야 한다.After selecting a plurality of key frame images from a video clip, the target frame in each key frame image can be used to determine the motion characteristic information of the target target, and of course, partial key frames in the selected multiple key frame images The motion characteristic information of the target object may be determined by using the target frame in the image. When determining the motion characteristic information of the target object by using the target frame in the partial key frame image, the target frame in the partial key frame image is extracted or determined, and then the motion characteristic information of the target object is obtained by using the extracted or determined frame again. have to decide

구체적으로 실시할 때, 대상 검출의 방법을 사용할 수 있고, 예를 들어 인체 검출기를 사용하며, 인체 검출의 방법을 사용하여, 대상 틀을 결정할 수 있고, 물론, 다른 방법을 사용하여 대상 틀을 결정할 수도 있으며, 본 발명은 대상 틀을 결정하는 방법에 대해 한정하지 않는다.In a specific implementation, the method of object detection may be used, for example, a human body detector may be used, the method of human body detection may be used to determine the object frame, and of course, other methods may be used to determine the object frame. Also, the present invention does not limit the method of determining the target frame.

구체적으로 실시할 때, 인체 검출기에 의해 검출되어 얻은 대상 틀을 동작 특징 정보를 결정하기 위한 최종적인 대상 틀로 사용할 수 있다. 하지만 인체 검출기에 의해 검출되어 얻은 대상 틀은 타깃 대상이 포함된 비교적 작은 틀일 수 있기에, 더욱 완전한 타깃 대상의 정보 및 더욱 많은 환경 정보를 획득하기 위해, 인체 검출기가 검출하여 대상 틀을 얻은 다음, 기설정된 확장 사이즈 정보에 따라, 각 인체 검출기에 의해 검출되어 얻은 대상 틀에 대해 각각 확장을 수행하여, 상기 타깃 대상이 각 상기 키 프레임 이미지에서의 최종적인 대상 틀을 얻을 수도 있다. 다음, 결정된 최종적인 대상 틀을 사용하여 타깃 대상의 동작 특징 정보를 결정한다.When specifically implemented, a target frame detected and obtained by the human body detector may be used as a final target frame for determining motion characteristic information. However, since the target frame detected and obtained by the human body detector may be a relatively small frame including the target target, in order to obtain more complete target target information and more environmental information, the human body detector detects and obtains the target frame, and then According to the set extension size information, each of the target frames detected and obtained by each human body detector may be expanded, so that the target object may obtain a final target frame in each of the key frame images. Next, motion characteristic information of the target target is determined using the determined final target frame.

상기 대상 틀에 대해 확장을 수행하는 확장 사이즈 정보는 미리 설정된 것이고, 예를 들어, 상기 확장 사이즈 정보는 대상 틀이 길이 방향에서의 제1 연장 길이 및 대상 틀이 폭 방향에서의 제2 연장 길이를 포함한다. 상기 제1 연장 길이에 따라 대상 틀의 길이에 대해 양측으로 각각 연장하고, 길이 방향에서 양측으로 각각 상기 제1 연장 길이의 절반만큼 연장한다. 상기 제2 연장 길이에 따라 대상 틀의 폭에 대해 양측으로 각각 연장하고, 폭 방향에서 양측으로 각각 상기 제2 연장 길이의 절반만큼 연장한다.The extended size information for performing the expansion on the target frame is preset, for example, the extended size information includes a first extension length of the target frame in the longitudinal direction and a second extension length of the target frame in the width direction of the target frame. include According to the first extension length, it extends on both sides with respect to the length of the target frame, and extends on both sides in the longitudinal direction by half of the first extension length, respectively. The second extension length extends to both sides with respect to the width of the target frame, and extends to both sides in the width direction by half the second extension length, respectively.

상기 제1 연장 길이 및 제2 연장 길이는 미리 설정된 구체적인 값일 수 있고, 인체 검출기에 의해 직접 검출되어 얻은 대상 틀의 길이 및 폭에 기반하여 결정된 값일 수도 있다. 예를 들어, 제1 연장 길이는 인체 검출기에 의해 직접 검출되어 얻은 대상 틀의 길이와 같을 수 있고, 제2 연장 길이는 인체 검출기에 의해 직접 검출되어 얻은 대상 틀의 폭과 같을 수 있다.The first extension length and the second extension length may be predetermined specific values, or may be values determined based on the length and width of the target frame obtained by being directly detected by the human body detector. For example, the first extended length may be equal to the length of the target frame obtained by being directly detected by the human body detector, and the second extended length may be equal to the width of the target frame obtained by being directly detected by the human body detector.

상기 형태를 통해, 대상 검출의 방법을 사용하여 타깃 대상이 이미지에서의 틀을 결정하여, 동작 인식을 수행하여 처리가 필요한 데이터량을 줄였고, 비교적 작은 초기 대상 경계 박스를 결정한 다음, 동작 인식을 수행하도록 구성된 대상 틀로 하여금 더욱 완전한 타깃 대상의 정보 및 더욱 많은 환경 정보를 포함할 수 있도록 하여, 동작 인식의 정확도를 향상시키는데 유리하다.Through the above form, using the method of object detection, the target object determines the frame in the image, performs motion recognition to reduce the amount of data that needs processing, determines a relatively small initial object bounding box, and then performs motion recognition It is advantageous to improve the accuracy of motion recognition by enabling the target frame configured to include more complete target target information and more environmental information.

상기 동작 특징 정보는 비디오 클립에서의 이미지에서 추출된, 타깃 대상의 동작 특징을 나타낼 수 있는 정보이다.The motion characteristic information is information extracted from an image in a video clip and may indicate motion characteristics of a target object.

단계 S130에 있어서, 상기 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정한다.In step S130, based on the video clip and the motion characteristic information, scenario characteristic information and time series characteristic information corresponding to the target object are determined.

여기서, 시나리오 특징 정보는 타깃 대상이 위치한 시나리오의 시나리오 특징을 나타내기 위한 것이고, 키 프레임 이미지와 서로 연관된 적어도 부분 연관 이미지에서 시나리오 특징 추출을 수행하여 얻을 것일 수 있다.Here, the scenario feature information is to indicate the scenario feature of a scenario in which the target object is located, and may be obtained by performing scenario feature extraction from at least partially related images associated with the key frame image.

시계열 특징 정보는 타깃 대상의 동작과 시계열에서 연관되는 특징 정보이고, 예를 들어 비디오 클립에서의 타깃 대상을 제외한 다른 대상의 동작 특징 정보일 수 있고, 구체적으로 실시할 때, 비디오 클립 및 타깃 대상의 동작 특징 정보에 기반하여 결정될 수 있다.The time series characteristic information is characteristic information related to the motion of the target object in time series, and may be, for example, motion characteristic information of an object other than the target object in a video clip. It may be determined based on the operation characteristic information.

단계 S140에 있어서, 상기 동작 특징 정보, 상기 시나리오 특징 정보 및 상기 시계열 특징 정보에 기반하여, 상기 타깃 대상의 동작 타입을 결정한다.In step S140, an operation type of the target is determined based on the operation characteristic information, the scenario characteristic information, and the time series characteristic information.

동작 특징 정보, 시나리오 특징 정보 및 시계열 특징 정보를 결정한 다음, 상기 세 가지 정보를 병합할 수 있다. 예를 들어 접합한 다음, 병합하여 얻은 정보에 대해 분류하여, 타깃 대상의 동작 타입을 얻어, 타깃 대상의 동작 인식을 구현한다.After determining the operation characteristic information, the scenario characteristic information, and the time series characteristic information, the three pieces of information may be merged. For example, by concatenating and then merging information obtained by classifying the information obtained, the motion type of the target target is obtained, and motion recognition of the target target is implemented.

본 발명의 실시예에서, 타깃 대상에 대응되는 대상 틀을 사용하여 동작 특징 정보를 결정하는 것이고, 전체 프레임 이미지를 사용하여 동작 특징 정보를 결정하는 것은 아니며, 각 프레임 이미지에서 동작 인식을 수행하기 위한 데이터량을 효과적으로 낮출 수 있음으로써, 동작 인식을 수행하기 위한 이미지의 수량을 증가할 수 있어서, 동작 인식의 정확도를 향상시키는데 유리하고; 또한, 본 발명의 실시예에서는 타깃 대상의 동작 특징 정보를 사용하여 동작 분류 및 인식을 수행할 뿐만 아니라, 비디오 클립 및 결정된 상기 동작 특징 정보를 사용하여, 타깃 대상이 위치한 시나리오의 시나리오 특징 정보 및 타깃 대상의 동작과 연관되는 시계열 특징 정보를 추출하며, 동작 특징 정보의 기초에서, 시나리오 정보 및 시계열 특징 정보를 결합하여 동작 인식의 정확도를 추가로 향상시킬 수 있다.In an embodiment of the present invention, motion characteristic information is determined by using a target frame corresponding to a target target, and motion characteristic information is not determined using an entire frame image, but for performing motion recognition in each frame image. By effectively lowering the amount of data, it is possible to increase the quantity of images for performing gesture recognition, which is advantageous for improving the accuracy of gesture recognition; In addition, in the embodiment of the present invention, motion classification and recognition are performed using the motion characteristic information of the target object, and the scenario characteristic information and the target of the scenario in which the target object is located using the video clip and the determined motion characteristic information. Time-series characteristic information related to the motion of the target is extracted, and the accuracy of motion recognition can be further improved by combining scenario information and time-series characteristic information based on the motion characteristic information.

본 발명의 실시예에 있어서, 도 1b에 도시된 바와 같은 네트워크 아키텍처를 통해, 타깃 대상에 대한 동작 인식을 수행하고, 도 1b는 본 발명의 실시예에서 제공하는 네트워크 아키텍처 예시도를 도시하였으며, 상기 네트워크 아키텍처는 사용자 단말(201), 네트워크(202) 및 동작 인식의 단말 기기(203)를 포함한다. 예시적 응용 사용자 단말(201) 및 동작 인식의 단말 기기(203)가 네트워크(202)를 통해 통신 연결을 구축하는 것을 구현하는 것을 지원하기 위해, 사용자 단말(201)이 타깃 대상의 동작 타입을 획득해야 할 때, 먼저, 동작 타입을 결정하도록 구성된 요청 정보를 네트워크(202)를 통해 동작 인식의 단말 기기(203)에 송신한 다음; 동작 인식의 단말 기기(203)는, 비디오 클립을 획득하는 것을 통해, 타깃 대상에 대응되는 대상 틀을 사용하여 동작 특징 정보를 결정하며; 비디오 클립 및 결정된 동작 특징 정보를 사용하여, 타깃 대상이 위치한 시나리오의 시나리오 특징 정보 및 타깃 대상의 동작과 연관되는 시계열 특징 정보를 추출하고; 마지막에, 시나리오 정보 및 시계열 특징 정보를 종합적으로 고려하여, 더욱 높은 정확도로 타깃 대상의 동작 타입을 결정하며, 결정된 동작 타입을 사용자 단말(201)에 피드백한다.In an embodiment of the present invention, motion recognition for a target is performed through a network architecture as shown in FIG. 1B, and FIG. 1B shows an exemplary diagram of a network architecture provided in an embodiment of the present invention. The network architecture includes a user terminal 201 , a network 202 , and a terminal device 203 of gesture recognition. Example application To support the user terminal 201 and the terminal device 203 of gesture recognition to establish a communication connection through the network 202, the user terminal 201 acquires the operation type of the target object When to do, first, send the request information configured to determine the operation type to the terminal device 203 of the gesture recognition via the network 202; The motion recognition terminal device 203 determines motion characteristic information by using a target frame corresponding to the target target through acquiring a video clip; extracting scenario characteristic information of a scenario in which the target object is located and time series characteristic information associated with the motion of the target object by using the video clip and the determined motion characteristic information; Finally, the operation type of the target is determined with higher accuracy by comprehensively considering the scenario information and the time series characteristic information, and the determined operation type is fed back to the user terminal 201 .

예시로써, 사용자 단말(201)은 데이터 처리 능력을 구비한 기기를 포함할 수 있고, 동작 인식의 단말 기기(203)는 이미지 수집 장치 및 데이터 처리 능력을 구비한 처리 기기 또는 원격 서버를 포함할 수 있다. 네트워크(202)는 유선 연결 또는 무선 연결 형태를 사용할 수 있다. 여기서, 동작 인식의 단말 기기는 데이터 처리 능력을 구비한 처리 기기일 때, 사용자 단말(201)은 유선 연결의 형태를 통해 처리 기기와 통신 연결할 수 있고, 예를 들어 버스를 통해 데이터 통신을 수행하며; 동작 인식의 단말 기기(203)는 원격 서버일 때, 사용자 단말은 무선 네트워크를 통해 원격 서버와 데이터 인터랙션을 수행할 수 있다.As an example, the user terminal 201 may include a device having data processing capability, and the gesture recognition terminal device 203 may include an image collection device and a processing device or remote server having data processing capability. have. The network 202 may use a wired connection or a wireless connection type. Here, when the gesture recognition terminal device is a processing device having data processing capability, the user terminal 201 may communicate with the processing device through a wired connection, for example, perform data communication through a bus, ; When the gesture recognition terminal device 203 is a remote server, the user terminal may perform data interaction with the remote server through a wireless network.

일부 실시예에 있어서, 도 2에 도시된 바와 같이, 상기 타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정하는 단계는, 구체적으로 하기와 같은 단계를 사용하여 구현할 수 있다.In some embodiments, as shown in FIG. 2 , the determining of the motion characteristic information of the target object according to the target frame in the key frame image in the video clip by the target object includes: It can be implemented using the same steps as

단계 S210에 있어서, 키 프레임 이미지를 대상으로 하여, 상기 비디오 클립에서 상기 키 프레임 이미지와 대응되는 복수 개 연관 이미지를 선별한다.In step S210, a plurality of related images corresponding to the key frame image are selected from the video clip with respect to the key frame image.

여기서, 키 프레임 이미지와 서로 연관된 연관 이미지는 키 프레임 이미지의 이미지 특징과 유사한 이미지이고, 예를 들어 키 프레임 이미지의 촬영 시간과 비슷한 이미지일 수 있다.Here, the related image associated with the key frame image may be an image similar to an image characteristic of the key frame image, and may be, for example, an image similar to a shooting time of the key frame image.

구체적으로 실시할 때, 하기와 같은 서브 단계를 사용하여 키 프레임 이미지에 대응되는 연관 이미지를 선별할 수 있다.When specifically implemented, a related image corresponding to the key frame image may be selected using the following sub-steps.

서브 단계 1에 있어서, 상기 비디오 클립에서 키 프레임 이미지를 포함하는 제1 서브 비디오 클립 - 상기 제1 서브 비디오 클립은 상기 키 프레임 이미지와 시계열에서 인접하는 N 개 이미지를 더 포함하며, N은 양의 정수임 - 을 선택한다.A first sub-video clip according to sub-step 1, comprising a key frame image in the video clip, wherein the first sub video clip further comprises N images adjacent in time series to the key frame image, where N is positive Integer - Select .

상기 제1 서브 비디오 클립에서, 키 프레임 이미지는 제1 서브 비디오 클립의 전반 부분의 클립에 위치했을 수 있고, 제1 서브 비디오 클립의 후반 부분의 클립에 위치했을 수도 있으며, 물론 제1 서브 비디오 클립의 중부 또는 중부에 가까운 위치에 위치했을 수도 있다.In the first sub-video clip, the key frame image may be located in a clip in the first half of the first sub-video clip, or in a clip in the second half of the first sub-video clip, of course, in the first sub video clip It may have been located in the middle or close to the center of

가능한 실시 형태에서, 비디오 클립에서 키 프레임 이미지를 포함하는 서브 비디오 클립을 절취할 수 있고, 예를 들어, 64 프레임의 서브 비디오 클립을 절취할 수 있다. 상기 서브 비디오 클립에서, 키 프레임 이미지는 서브 비디오 클립의 중부 또는 중부에 가까운 위치에 있다. 예를 들어, 서브 비디오 클립은 키 프레임 이미지의 앞 32 프레임 이미지, 키 프레임 이미지 및 상기 키 프레임 이미지의 뒤 31 프레임 이미지를 포함하고; 또 예를 들어, 상기 서브 비디오 클립에서, 키 프레임 이미지는 서브 비디오 클립의 전반 부분의 클립에 있으며, 서브 비디오 클립은 키 프레임 이미지의 앞 10 프레임 이미지, 키 프레임 이미지 및 상기 키 프레임 이미지의 뒤 53 프레임 이미지를 포함한다. 또 예를 들어, 상기 서브 비디오 클립에서, 키 프레임 이미지는 서브 비디오 클립의 후반 부분의 클립에 있고, 서브 비디오 클립은 키 프레임 이미지의 앞 50 프레임 이미지, 키 프레임 이미지 및 상기 키 프레임 이미지의 뒤 13 프레임 이미지를 포함한다.In a possible embodiment, a sub-video clip containing a key frame image may be truncated from the video clip, eg a sub-video clip of 64 frames may be truncated. In the sub video clip, the key frame image is in the middle or close to the middle of the sub video clip. For example, the sub video clip includes a 32 frame image before a key frame image, a key frame image and a 31 frame image after the key frame image; Also for example, in the sub video clip, the key frame image is in the clip of the first part of the sub video clip, and the sub video clip is a 10 frame image before the key frame image, a key frame image and 53 after the key frame image Contains frame images. Also for example, in the sub video clip, the key frame image is in the clip of the latter part of the sub video clip, and the sub video clip is a 50 frame image before the key frame image, a key frame image and 13 after the key frame image. Contains frame images.

또한, 상기 제1 서브 비디오 클립에서, 키 프레임 이미지는 제1 서브 비디오 클립의 양단에 위치할 수도 있고, 즉, 상기 키 프레임 이미지와 시계열에서 인접하는 N 개 이미지는 키 프레임 이미지의 앞 N 개 이미지 또는 뒤 N 개 이미지이다. 본 발명은 키 프레임 이미지가 제1 서브 비디오 클립에서의 위치에 대해 한정하지 않는다.Also, in the first sub video clip, the key frame image may be located at both ends of the first sub video clip, that is, the N images adjacent to the key frame image in time series are the N images before the key frame image. or back N images. The present invention does not limit the position of the key frame image in the first sub video clip.

서브 단계 2에 있어서, 상기 제1 서브 비디오 클립에서 상기 복수 개 연관 이미지를 선별한다.In sub-step 2, the plurality of related images are selected from the first sub-video clip.

가능한 구현 형태에서, 기설정된 시간 간격에 기반하여 제1 서브 비디오 클립에서 연관 이미지를 선별할 수 있고, 예를 들어, 제1 서브 비디오 클립에서 기간 τ로 희소 샘플링하여 T 프레임 연관 이미지를 얻는다. 선별하여 얻은 연관 이미지에는 키 프레임 이미지가 포함될 수 있고, 키 프레임 이미지가 포함되지 않을 수도 있으며, 일정한 랜덤성을 구비하고, 본 발명은 연관 이미지에는 키 프레임 이미지가 포함되는지 여부에 대해 한정하지 않는다.In a possible implementation form, the associated image may be selected from the first sub-video clip based on a preset time interval, for example, sparsely sampled with a period τ in the first sub-video clip to obtain the T frame associated image. The selected related image may include a key frame image or may not include a key frame image, and has a certain randomness, and the present invention does not limit whether or not the key frame image is included in the related image.

기설정된 시간 간격에 기반하여, 키 프레임 이미지의 촬영 시간과 비슷한 서브 비디오 클립에서 키 프레임 이미지와 서로 연관된 이미지를 선별하면, 키 프레임 이미지와 연관 정도가 가장 가까운 이미지를 선별할 수 있고, 키 프레임 이미지와 연관 정도가 가장 가까운 이미지에 기반하여, 결정된 동작 특징 정보의 정확도를 향상시킬 수 있다.Based on a preset time interval, if an image related to the key frame image is selected from a sub video clip similar to the shooting time of the key frame image, an image that is the closest to the key frame image can be selected, and the key frame image It is possible to improve the accuracy of the determined motion characteristic information based on the image having the closest correlation to .

또한, 다른 방법을 사용하여 키 프레임 이미지와 서로 연관된 연관 이미지를 선별할 수도 있다. 예를 들어, 먼저 제1 서브 비디오 클립에서 각 프레임 이미지와 키 프레임 이미지의 이미지 유사도를 계산한 다음, 이미지 유사도가 가장 높은 복수 개 이미지를 선택하여 키 프레임 이미지와 서로 연관된 연관 이미지로 사용할 수 있다.In addition, other methods may be used to select key frame images and related images associated with each other. For example, in the first sub video clip, the image similarity between each frame image and the key frame image is first calculated, and then a plurality of images having the highest image similarity may be selected and used as the key frame image and the related image associated with each other.

단계 S220에 있어서, 상기 키 프레임 이미지에 대응되는 대상 틀에 따라, 상기 키 프레임 이미지에 대응되는 적어도 부분적 연관 이미지에서 부분 이미지를 각각 절취하여, 상기 키 프레임 이미지에 대응되는 복수 개 타깃 대상 이미지를 얻는다.In step S220, according to the target frame corresponding to the key frame image, partial images are cut out from at least partially related images corresponding to the key frame image, respectively, to obtain a plurality of target target images corresponding to the key frame image .

여기서 키 이미지에 대응되는 대상 틀을 사용하여, 키 프레임 이미지와 서로 연관된 부분 또는 전부 연관 이미지에서 부분 이미지를 절취한다. 만약 부분 연관 이미지에서 타깃 대상 이미지를 절취하는 것이면, 구체적으로 전부 연관 이미지에서 키 프레임 이미지의 촬영 시간과 가장 비슷한 부분 연관 이미지를 선택하여 타깃 대상 이미지를 절취하는 것일 수 있고, 물론 다른 방법을 사용하여 부분 연관 이미지를 선택하여 타깃 대상 이미지를 절취하는 것일 수도 있다. 예를 들어, 일정한 시간 간격에 따라, 전부 연관 이미지에서 부분 연관 이미지를 선택한다.Here, by using the target frame corresponding to the key image, the partial image is cut out from the partial or all related images related to the key frame image. If the target target image is cut from the partially related image, specifically, the target target image may be cut by selecting the partially related image most similar to the shooting time of the key frame image from all related images, and of course using another method It may be cutting out the target target image by selecting a partially related image. For example, according to a certain time interval, a partially associative image is selected from all associative images.

키 프레임 이미지에 대응되는 대상 틀에 따라, 타깃 대상 이미지를 절취할 때, 구체적인 과정은, 먼저 시간 순서에 따라, 모든 연관 이미지 또는 부분 연관 이미지에서 대상 틀을 카피하는 단계를 포함한다. 여기서, 연관 이미지에서 대상 틀을 카피할 때, 대상 틀이 키 프레임 이미지에서의 좌표 정보를 사용하여 연관 이미지에서의 틀 카피를 구현하는 것이고, 예를 들어 대상 틀이 키 프레임 이미지에서의 좌표 정보에 따라, 시간 순서에 따라 틀 위치 오프셋을 수행하거나 틀 위치를 직접 카피하여, 연관 이미지에서의 대상 틀을 얻는다. 다음, 대상 틀이 카피 완료된 후, 대상 틀에 따라 연관 이미지에 대해 크롭을 수행하여, 타깃 대상 이미지를 얻고, 즉 연관 이미지에서의 대상 틀 내의 이미지를 절취하여 상기 타깃 대상 이미지로 사용한다.When cutting the target target image according to the target frame corresponding to the key frame image, the specific process includes first copying the target frame from all the related images or the partial related images according to the chronological order. Here, when copying the target frame in the associative image, the target frame implements frame copying in the associative image by using the coordinate information in the key frame image, for example, the target frame in the coordinate information in the key frame image. Accordingly, the frame position offset is performed according to the temporal sequence or the frame position is directly copied to obtain the target frame in the associated image. Next, after the target frame is copied, the related image is cropped according to the target frame to obtain a target target image, that is, an image in the target frame from the related image is cut out and used as the target target image.

키 프레임 이미지의 작용은 타깃 대상 이미지의 포지셔닝을 구현하는 것이고, 반드시 동작 특징 정보를 직접 결정하기 위한 것은 아니다. 예를 들어, 연관 이미지에 키 프레임 이미지가 포함되지 않을 때, 키 프레임 이미지에서 동작 특징 정보를 결정하기 위한 타깃 대상 이미지를 절취하지 않는다.The action of the key frame image is to implement the positioning of the target target image, and is not necessarily for directly determining the motion characteristic information. For example, when the key frame image is not included in the associated image, the target target image for determining the motion characteristic information from the key frame image is not cut out.

단계 S230에 있어서, 키 프레임에 대응되는 복수 개 타깃 대상 이미지에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정한다.In step S230, motion characteristic information of the target target is determined based on a plurality of target target images corresponding to key frames.

상기 타깃 대상 이미지를 절취한 다음, 복수 개 타깃 대상 이미지에 대해 각각 동작 특징 추출을 수행할 수 있고, 구체적으로 3 차원(three Dimensional, 3D) 컨볼루션 신경 네트워크를 사용하여 타깃 대상 이미지에 대해 처리하여, 타깃 대상 이미지에서의 동작 특징을 추출하여, 타깃 대상의 동작 특징 정보를 얻을 수 있다.After the target target image is cut, motion feature extraction may be performed on each of a plurality of target target images, and specifically, the target target image may be processed using a three-dimensional (3D) convolutional neural network. , by extracting the motion characteristics from the target target image, it is possible to obtain motion characteristic information of the target target.

또한, 본 발명의 실시예에서 복수 개 타깃 대상 이미지를 얻은 다음, 상기 타깃 대상의 동작 특징 정보를 결정하기 전에, 하기와 같은 단계를 사용하여 타깃 대상 이미지에 대해 처리할 수도 있다.In addition, in an embodiment of the present invention, after obtaining a plurality of target target images, before determining the motion characteristic information of the target target, the following steps may be used to process the target target images.

상기 타깃 대상 이미지를 기설정 이미지 해상도를 구비한 이미지로 설정하도록 구성된다. 상기 기설정 이미지 해상도는 타깃 대상 이미지의 원래 이미지 해상도보다 높다. 구체적으로 실시할 때, 기존의 방법 또는 도구를 사용하여 타깃 대상 이미지의 이미지 해상도를 설정할 수 있고, 예를 들어, 보간법 등 방법을 사용하여 타깃 대상 이미지의 이미지 해상도를 조정한다.and set the target target image to an image having a preset image resolution. The preset image resolution is higher than the original image resolution of the target image. In concrete implementation, an existing method or tool may be used to set the image resolution of the target image, for example, an interpolation method or the like is used to adjust the image resolution of the target image.

여기서 타깃 대상 이미지를 절취한 다음, 타깃 대상 이미지를 기설정된 해상도로 설정하여, 타깃 대상 이미지에 포함되는 정보의 수량을 향상시킬 수 있고, 즉 절취된 타깃 대상 이미지를 확대하여, 더욱 많은 타깃 대상의 파인 그레인드 디테일을 유지할 수 있게 함으로써, 결정된 동작 특징 정보의 정확도를 향상시킬 수 있다.Here, the target target image is cut out, and then the target target image is set to a preset resolution, so that the quantity of information included in the target target image can be improved, that is, by enlarging the cut target target image, more By making it possible to maintain fine-grained detail, the accuracy of the determined motion characteristic information can be improved.

구체적으로 실시할 때, 상기 기설정 이미지 해상도를 H×W로 설정할 수 있고, 각 프레임 키 프레임 이미지에서 절취된 타깃 대상 이미지는 T 개이며, 각 프레임 타깃 대상 이미지의 채널 수량은 3이면, 3D 컨볼루션 신경 네트워크를 입력하여 동작 특징 추출을 수행하는 것은 T×H×W×3의 이미지 블록이다. 3D 컨볼루션 신경 네트워크를 통해 입력된 이미지 블록에 대해 글로벌 평균 풀링을 수행한 다음, 2048 차원의 특징 벡터를 얻을 수 있고, 상기 특징 벡터는 상기 동작 특징 정보이다.When specifically implemented, the preset image resolution can be set to H×W, the number of target target images cut from each frame key frame image is T, and the number of channels of each frame target target image is 3, 3D convolve It is an image block of T×H×W×3 to perform motion feature extraction by inputting the lution neural network. After global average pooling is performed on the input image block through the 3D convolutional neural network, a 2048-dimensional feature vector may be obtained, and the feature vector is the motion feature information.

본 발명의 실시예에서, 타깃 대상이 키 프레임 이미지에서의 대상 틀을 사용하여 포지셔닝을 수행하고, 키 프레임 이미지와 서로 연관된 복수 개 연관 이미지에서 동작 특징 정보를 결정하기 위한 타깃 대상 이미지를 절취하여, 동작 특징 정보 결정에 사용되는 이미지의 정밀도를 향상시켰으며, 동작 특징 정보를 결정하기 위한 이미지의 수량을 증가할 수 있음으로써, 동작 인식의 정확도를 향상시킬 수 있다.In an embodiment of the present invention, the target object performs positioning by using the object frame in the key frame image, and cuts the target object image for determining motion characteristic information from a plurality of associated images associated with each other with the key frame image, The precision of an image used for determining the motion characteristic information is improved, and the number of images for determining the motion characteristic information can be increased, thereby improving the accuracy of motion recognition.

일부 실시예에 있어서, 도 3에 도시된 바와 같이, 상기 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정하는 단계는,In some embodiments, as shown in FIG. 3 , the determining of scenario characteristic information and time series characteristic information corresponding to the target target based on the video clip and the motion characteristic information includes:

단계 S310에 있어서, 키 프레임 이미지를 대상으로 하여, 상기 비디오 클립에서 상기 키 프레임 이미지와 대응되는 복수 개 연관 이미지를 선별하고; 적어도 부분적 연관 이미지에 대해 비디오 시나리오 특징 추출 동작을 수행하여, 상기 시나리오 특징 정보를 얻는다.In step S310, a plurality of related images corresponding to the key frame image are selected from the video clip with respect to the key frame image; A video scenario feature extraction operation is performed on at least partially related images to obtain the scenario feature information.

여기서, 구체적으로 3D 컨볼루션 신경 네트워크를 사용하여 부분 또는 전부의 연관 이미지에 대해 비디오 시나리오 특징 추출 및 글로벌 평균 풀링을 수행하여, 2048 차원의 특징 벡터를 얻을 수 있고, 상기 특징 벡터는 상기 시나리오 특징 정보이다.Here, by performing video scenario feature extraction and global average pooling on part or all of the associated images using a 3D convolutional neural network specifically, a 2048-dimensional feature vector can be obtained, and the feature vector is the scenario feature information. am.

단계 S320에 있어서, 상기 비디오 클립에서의 타깃 대상을 제외한 다른 대상에 대해 시계열 특징 추출 동작을 수행하여, 초기 시계열 특징 정보를 얻는다.In step S320, a time series feature extraction operation is performed on objects other than the target object in the video clip to obtain initial time series feature information.

여기서, 초기 시계열 특징 정보는 예를 들어 다른 대상의 동작 특징과 같은 타깃 대상을 제외한 다른 대상의 시계열 특징이고, 구체적으로 실시할 때, 하기와 같은 서브 단계를 통해 결정할 수 있다.Here, the initial time-series characteristic information is, for example, a time-series characteristic of another object other than the target object, such as an operation characteristic of another object, and when specifically implemented, may be determined through the following sub-steps.

서브 단계 1에 있어서, 상기 키 프레임 이미지를 대상으로 하여, 상기 비디오 클립에서 키 프레임 이미지를 포함하는 제2 서브 비디오 클립 - 상기 제2 서브 비디오 클립은 상기 키 프레임 이미지와 시계열에서 인접하는 P 개 이미지를 더 포함하며, P는 양의 정수임 - 을 선택한다.The second sub-video clip according to sub-step 1, which includes a key frame image in the video clip with respect to the key frame image, wherein the second sub video clip is P images adjacent to the key frame image in time series , where P is a positive integer.

상기 제2 서브 비디오 클립에서, 키 프레임 이미지는 제2 서브 비디오 클립의 전반 부분의 클립에 위치했을 수 있고, 제2 서브 비디오 클립의 후반 부분의 클립에 위치했을 수도 있으며, 물론 제2 서브 비디오 클립의 중부 또는 중부에 가까운 위치에 위치했을 수도 있다.In the second sub video clip, the key frame image may be located in a clip in the first half of the second sub video clip, or in a clip in the second half in the second sub video clip, of course, in the second sub video clip It may have been located in the middle or close to the center of

또한, 상기 제2 서브 비디오 클립에서, 키 프레임 이미지는 제2 서브 비디오 클립의 양단에 위치할 수도 있고, 즉, 상기 키 프레임 이미지와 시계열에서 인접하는 P 개 이미지는 키 프레임 이미지의 앞 P 개 이미지 또는 뒤 P 개 이미지이다. 본 발명은 키 프레임 이미지가 제2 서브 비디오 클립에서의 위치에 대해 한정하지 않는다.Also, in the second sub video clip, key frame images may be located at both ends of the second sub video clip, that is, the P images adjacent to the key frame image in time series are the P images before the key frame image. or back P images. The present invention does not limit the position of the key frame image in the second sub video clip.

가능한 구현 형태에서, 비디오 클립에서 키 프레임 이미지를 포함하는 서브 비디오 클립을 절취하고, 2 초의 서브 비디오 클립을 절취할 수 있으며, 상기 서브 비디오의 시간은 비교적 길고 긴 시계열의 시계열 특징을 결정하기 위한 것이다.In a possible implementation form, a sub-video clip comprising a key frame image may be cut out from the video clip, and a sub-video clip of 2 seconds may be cut out, wherein the time of the sub video is for determining a time series characteristic of a relatively long time series .

서브 단계 2에 있어서, 상기 제2 서브 비디오 클립에서의 각 이미지에서, 상기 타깃 대상을 제외한 다른 대상의 동작 특징을 추출하고, 얻은 동작 특징을 상기 초기 시계열 특징 정보로 사용한다.In sub-step 2, motion characteristics of objects other than the target object are extracted from each image in the second sub-video clip, and the obtained motion characteristics are used as the initial time-series characteristic information.

여기서, 구체적으로 3D 컨볼루션 신경 네트워크를 사용하여 서브 비디오 클립에서 상기 타깃 대상을 제외한 다른 대상의 동작 특징을 추출할 수 있고, 얻은 초기 시계열 특징 정보는 비디오 시계열 특징 라이브러리(long-term Feature Bank, LFB)의 형태로 저장 및 사용될 수 있다.Here, in detail, motion features of objects other than the target object can be extracted from the sub-video clip using a 3D convolutional neural network, and the obtained initial time-series feature information is a video time-series feature library (long-term feature bank, LFB). ) can be stored and used in the form of

본 발명의 실시예에서, 비디오 클립에서 키 프레임 이미지의 촬영 시간과 비교적 비슷한 서브 비디오 클립을 선택하여 시계열 특징을 추출하여, 추출하여 얻은 시계열 특징의 데이터량을 감소할 수 있고, 결정된 시계열 특징과 키 프레임 이미지의 연관성을 향상시킬 수 있음으로써, 동작 인식의 정확도를 향상시키는데 유리하며; 또한, 본 발명의 실시예에서, 다른 대상의 동작 특징을 시계열 특징으로 사용하여, 동작 인식에 사용되는 순차성 특징의 타깃성을 향상시킬 수 있음으로써, 동작 인식의 정확도를 향상시키는데 유리하다.In an embodiment of the present invention, by selecting a sub-video clip relatively similar to the shooting time of the key frame image from the video clip to extract the time series feature, the amount of data of the extracted time series feature can be reduced, and the determined time series feature and the key By being able to improve the association of frame images, it is advantageous to improve the accuracy of motion recognition; In addition, in an embodiment of the present invention, it is possible to improve the targetability of the sequential feature used for motion recognition by using the motion characteristics of other objects as time series characteristics, which is advantageous for improving the accuracy of motion recognition.

단계 S330에 있어서, 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 결정한다.In step S330, based on the initial time series characteristic information and the operation characteristic information, time series characteristic information corresponding to the target object is determined.

여기서, 구체적으로 초기 시계열 특징 정보 및 동작 특징 정보에 대해 시계열 특징 추출을 수행하여, 타깃 대상에 대응되는 시계열 특징 정보를 얻을 수 있다.Here, time-series feature extraction may be performed on the initial time-series feature information and the operation feature information to obtain time-series feature information corresponding to the target object.

가능한 구현 형태에서, 하기와 같은 서브 단계를 사용하여 초기 시계열 특징 정보 및 동작 특징 정보에 대해 시계열 특징 추출을 수행함으로써, 타깃 대상에 대응되는 시계열 특징 정보를 얻을 수 있다In a possible implementation form, time-series feature information corresponding to a target object can be obtained by performing time-series feature extraction on initial time-series feature information and operation feature information using the following sub-steps.

서브 단계 1에 있어서, 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 대해 각각 차원 축소 처리를 수행한다.In sub-step 1, dimension reduction processing is performed on the initial time series characteristic information and the operation characteristic information, respectively.

타깃 대상을 제외한 다른 대상의 초기 시계열 특징 정보 및 타깃 대상의 동작 특징 정보를 얻은 다음, 초기 시계열 특징 정보 및 동작 특징 정보에 대해 차원 축소 처리를 수행하여, 처리가 필요한 데이터량을 줄일 수 있음으로써, 동작 인식의 효율을 향상시키는데 유리하다.It is possible to reduce the amount of data required for processing by obtaining initial time-series characteristic information and operation characteristic information of the target object other than the target object, and then performing dimensionality reduction processing on the initial time-series characteristic information and operation characteristic information. It is advantageous for improving the efficiency of motion recognition.

가능한 구현 형태에서, 초기 시계열 특징 정보 및 동작 특징 정보를 얻은 다음, 초기 시계열 특징 정보 및 동작 특징 정보에 대해 랜덤 드롭아웃(Dropout) 처리를 수행할 수도 있고, Dropout 처리는 초기 시계열 특징 정보 및 동작 특징 정보를 추출하기 위한 신경 네트워크의 제일 마지막 네트워크 계층에서 구현될 수 있으며, 초기 시계열 특징 정보 및 동작 특징 정보를 추출하는 신경 네트워크의 각 네트워크 계층에서 구현될 수도 있다.In a possible implementation form, after obtaining initial time-series characteristic information and operation characteristic information, random dropout processing may be performed on the initial time-series characteristic information and operation characteristic information, and the dropout processing is performed on the initial time-series characteristic information and operation characteristic information. It may be implemented in the last network layer of the neural network for extracting information, and may be implemented in each network layer of the neural network for extracting initial time series characteristic information and operation characteristic information.

서브 단계 2에 있어서, 차원 축소 처리된 초기 시계열 특징 정보에 대해 평균값 풀링 동작을 수행한다.In sub-step 2, an average pooling operation is performed on the initial time series feature information that has been dimensionally reduced.

서브 단계 3에 있어서, 평균값 풀링 동작을 수행한 초기 시계열 특징 정보 및 차원 축소 처리된 동작 특징 정보에 대해 병합 동작을 수행하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 얻는다. 상기 병합 동작은 구체적으로 채널 병합, 즉 하나의 특징 정보의 채널을 다른 특징 정보의 채널에 추가한 다음 병합을 구현할 수 있고; 병합 동작은 덧셈 동작, 즉 평균값 풀링 동작된 초기 시계열 특징 정보 및 차원 축소 처리된 동작 특징 정보를 덧셈하는 동작일 수도 있다.In sub-step 3, a merging operation is performed on the initial time-series feature information on which the average pooling operation has been performed and the dimension-reduced operation feature information to obtain time-series feature information corresponding to the target object. The merging operation may specifically implement channel merging, that is, adding a channel of one feature information to a channel of another feature information and then merging; The merging operation may be an addition operation, that is, an operation of adding initial time series characteristic information subjected to an average pooling operation and operation characteristic information subjected to dimensionality reduction.

서브 단계 2 및 서브 단계 3은 실질적으로 초기 시계열 특징 정보 및 동작 특징 정보에 대해 시계열 특징 추출 동작을 수행하는 것이고, 구체적으로 도 4에 도시된 바와 같이 간소화된 시계열 특징 추출 모듈을 사용하여 구현할 수 있다. 도 4에 도시된 바와 같은 간소화된 시계열 특징 추출 모듈(400)은 상기 시계열 특징 정보를 추출하기 위한 것이고, 구체적으로 선형(Linear) 계층(401), 평균 풀링(Average) 계층(402), 표준화 및 활성화 함수(LN+ReLU) 계층(403) 및 랜덤 드롭아웃(Dropout) 계층(404)을 포함할 수 있다. 상기 서브 단계 2에서, 시계열 특징 추출 동작에 대해 간소화하였고, 평균 풀링 Average 계층만 사용하여 차원 축소 처리된 초기 시계열 특징 정보에 대해 평균값 풀링 동작을 수행하며, 정규화(softmax) 동작은 수행하지 않았고, 시계열 특징 추출의 동작 단계를 간소화하였으며, 즉 기존의 시계열 특징 추출 모듈을 간소화하여, 동작 인식의 효율을 향상시킬 수 있다. 여기서, 기존의 시계열 특징 추출 모듈은 평균 풀링 계층을 포함하지 않고, 분류 정규화 softmax 계층을 포함하며, 상기 softmax 계층에서 수행되는 처리 복잡도는 평균 풀링 동작보다 높다. 또한, 기존의 시계열 특징 추출 모듈은 드롭아웃 계층 전에 선형 계층을 더 포함하고, 본 발명에서의 간소화된 시계열 특징 추출 모듈은 상기 선형 계층을 포함하지 않기에, 동작 인식의 효율을 추가로 향상시킬 수 있다.Sub-steps 2 and 3 are substantially to perform a time-series feature extraction operation on the initial time-series feature information and operation feature information, and can be specifically implemented using a simplified time-series feature extraction module as shown in FIG. 4 . . The simplified time series feature extraction module 400 as shown in FIG. 4 is for extracting the time series feature information, and specifically, a linear layer 401, an average pooling (Average) layer 402, standardization and It may include an activation function (LN+ReLU) layer 403 and a random dropout layer 404 . In the sub-step 2, the time series feature extraction operation is simplified, the average pooling operation is performed on the initial time series feature information that has been dimensionally reduced using only the average pooling Average layer, and the normalization (softmax) operation is not performed, and the time series The operation step of feature extraction is simplified, that is, the existing time-series feature extraction module is simplified to improve the efficiency of motion recognition. Here, the existing time series feature extraction module does not include an average pooling layer, but includes a classification normalization softmax layer, and the processing complexity performed in the softmax layer is higher than the average pooling operation. In addition, since the existing time series feature extraction module further includes a linear layer before the dropout layer, and the simplified time series feature extraction module in the present invention does not include the linear layer, the efficiency of motion recognition can be further improved. have.

구체적으로 실시할 때, 시계열 특징 추출 모듈에 의해 출력된 시계열 특징 정보는 512 차원의 특징 벡터일 수 있고, 상기 512 차원의 특징 벡터는 상기 타깃 대상의 시계열 특징 정보이다.Specifically, the time series feature information output by the time series feature extraction module may be a 512-dimensional feature vector, and the 512-dimensional feature vector is time series feature information of the target object.

본 발명의 실시예에서, 키 프레임 이미지와 서로 연관된 부분 또는 전부 연관 이미지에서 시나리오 특징을 추출하여, 비교적 완전한 시나리오 특징 정보를 얻을 수 있고, 비교적 완전한 시나리오 특징 정보에 기반하여 동작 인식의 정확도를 향상시킬 수 있다. 또한, 본 발명의 실시예에서 타깃 대상을 제외한 다른 대상의 시계열 특징 즉 상기 초기 시계열 특징 정보를 추출하고, 다른 대상의 시계열 특징 및 타깃 대상의 동작 특징 정보에 기반하여, 타깃 대상과 서로 연관된 시계열 특징 정보를 결정하였으며, 상기 타깃 대상과 서로 연관된 시계열 특징 정보를 사용하여, 동작 인식의 정확도를 추가로 향상시킬 수 있다.In an embodiment of the present invention, by extracting scenario features from a partially or all related image associated with the key frame image, relatively complete scenario feature information can be obtained, and the accuracy of motion recognition can be improved based on the relatively complete scenario feature information. can In addition, in an embodiment of the present invention, time series characteristics of other objects except for the target object, that is, the initial time series characteristic information are extracted, and time series characteristics associated with the target object are extracted based on the time series characteristics of other objects and the operation characteristic information of the target object After determining the information, the accuracy of motion recognition may be further improved by using time-series characteristic information related to the target object.

추출된 시계열 특징 정보의 정확도를 추가로 향상시키기 위해, 복수 개 시계열 특징 추출 모듈을 직렬연결하여 상기 시계열 특징 정보를 추출하고, 시계열 특징 추출 모듈이 추출하여 얻은 시계열 특징 정보는 다른 시계열 특징 추출 모듈의 입력으로 사용될 수 있다. 구체적으로, 이전 시계열 특징 추출 모듈이 추출하여 얻은 상기 타깃 대상에 대응되는 시계열 특징 정보를 새로운 초기 시계열 특징 정보로 사용하고, 초기 시계열 특징 정보 및 상기 동작 특징 정보에 대해 각각 차원 축소 처리를 수행하는 단계로 리턴할 수 있다.In order to further improve the accuracy of the extracted time-series feature information, a plurality of time-series feature extraction modules are serially connected to extract the time-series feature information, and the time-series feature information extracted by the time-series feature extraction module is obtained by extracting the time-series feature information from other time-series feature extraction modules. can be used as input. Specifically, using the time series characteristic information corresponding to the target object extracted by the previous time series characteristic extraction module as new initial time series characteristic information, and performing dimension reduction processing on the initial time series characteristic information and the operation characteristic information, respectively can be returned as

구체적으로 실시할 때, 3 개 간소화된 시계열 특징 추출 모듈을 직렬연결하여 최종적인 시계열 특징 정보를 결정할 수 있다.When specifically implemented, the final time series feature information can be determined by connecting three simplified time series feature extraction modules in series.

아래에 구체적인 실시예를 통해 본 발명의 동작 인식 방법에 대해 설명한다.A method for recognizing a motion of the present invention will be described below through specific embodiments.

도 5에 도시된 바와 같이, 본 발명의 실시예는 사람을 타깃 대상으로 하여 동작 인식을 수행한다. 구체적으로, 본 발명의 실시예의 동작 인식 방법은 하기와 같은 단계를 포함할 수 있다.As shown in FIG. 5 , in the embodiment of the present invention, motion recognition is performed by targeting a person. Specifically, the method for recognizing a motion according to an embodiment of the present invention may include the following steps.

단계 1에 있어서, 비디오 클립(501)을 획득하고, 상기 비디오 클립에서 키 프레임 이미지를 선별한다.In step 1, a video clip 501 is obtained, and a key frame image is selected from the video clip.

단계 2에 있어서, 인체 검출기(502)를 사용하여, 각 키 프레임 이미지에 대해 인물 포지셔닝을 수행하여, 인물 즉 타깃 대상의 초기 대상 경계 박스를 얻는다.In step 2, by using the human body detector 502, a person positioning is performed on each key frame image to obtain an initial object bounding box of a person, that is, a target object.

단계 3에 있어서, 기설정 확장 사이즈 정보에 따라, 상기 초기 대상 경계 박스에 대해 확장을 수행하여, 최종적인 대상 틀을 얻은 다음; 대상 틀을 사용하여 키 프레임 이미지와 서로 연관된 연관 이미지에 대해 부분 이미지 절취를 수행하여, 각 키 이미지에 대응되는 타깃 대상 이미지를 얻는다.In step 3, the initial target bounding box is expanded according to preset expansion size information to obtain a final target frame; By using the target frame, partial image cutting is performed on the key frame image and the correlated image to obtain a target target image corresponding to each key image.

단계 4에 있어서, 얻은 모든 키 이미지에 대응되는 타깃 대상 이미지를 3D 컨볼루션 신경 네트워크(503)에 입력하고, 3D 컨볼루션 신경 네트워크(503)를 사용하여 타깃 대상의 동작 특징을 추출하여, 타깃 대상에 대응되는 동작 특징 정보를 얻는다.In step 4, the target target images corresponding to all the obtained key images are input to the 3D convolutional neural network 503, the motion characteristics of the target are extracted using the 3D convolutional neural network 503, and the target target to obtain operation characteristic information corresponding to .

단계 5에 있어서, 키 프레임 이미지와 서로 연관된 연관 이미지를 상기 3D 컨볼루션 신경 네트워크(503)에 입력하고, 3D 컨볼루션 신경 네트워크(503)를 사용하여 타깃 대상이 위치한 시나리오의 비디오 시나리오 특징을 추출하여, 시나리오 특징 정보를 얻는다.In step 5, the key frame image and the related image associated with each other are input to the 3D convolutional neural network 503, and the video scenario feature of the scenario in which the target object is located is extracted using the 3D convolutional neural network 503. , to obtain scenario characteristic information.

단계 6에 있어서, 다른 3D 컨볼루션 신경 네트워크(503)를 사용하여 비디오 클립에 대해 시계열 특징 추출을 수행하고, 즉 상기 타깃 대상을 제외한 다른 대상의 동작 특징을 추출하여, 초기 시계열 특징 정보를 얻으며, 상기 초기 시계열 특징 정보는 시계열 특징 라이브러리의 형태로 존재할 수 있고; 여기서, 시계열 특징 추출을 수행할 때, 전체 비디오 클립에서 추출할 수 있을 뿐만 아니라, 비디오 클립에서의 키 프레임 이미지를 포함하는 비교적 긴 서브 비디오 클립에서 추출한 것일 수도 있다.In step 6, time-series feature extraction is performed on the video clip using another 3D convolutional neural network 503, that is, motion features of objects other than the target object are extracted to obtain initial time-series feature information; The initial time series feature information may exist in the form of a time series feature library; Here, when time-series feature extraction is performed, it may be extracted from the entire video clip as well as from a relatively long sub-video clip including a key frame image in the video clip.

단계 7에 있어서, 간소화된 시계열 특징 추출 모듈(504)을 사용하여, 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 대해 시계열 특징 추출 동작을 수행하여, 타깃 대상에 대응되는 시계열 특징 정보를 얻는다.In step 7, a time-series feature extraction operation is performed on the initial time-series feature information and the operation feature information using the simplified time-series feature extraction module 504 to obtain time-series feature information corresponding to a target object.

단계 8에 있어서, 상기 시계열 특징 정보, 동작 특징 정보 및 시나리오 특징 정보를 접합 처리하고, 동작 분류기(505)를 사용하여 접합하여 얻은 정보에 대해 분류하여, 타깃 대상의 동작 타입을 얻는다.In step 8, the time series characteristic information, the operation characteristic information, and the scenario characteristic information are spliced, and the information obtained by splicing is classified using the motion classifier 505 to obtain a target motion type.

상기 동작 인식 방법에 대응하여, 본 발명은 동작 인식 장치를 더 제공하고, 상기 동작 인식 장치는 타깃 대상에 대해 동작 인식을 수행하는 단말 기기 등 하드웨어 기기에 응용되며, 각 모듈은 상기 방법과 동일한 방법 단계를 구현하여 동일한 유익한 효과를 획득할 수 있기에, 그중 동일한 부분에 대해, 본 발명은 더 이상 설명하지 않는다.Corresponding to the gesture recognition method, the present invention further provides a gesture recognition apparatus, wherein the gesture recognition apparatus is applied to a hardware device such as a terminal device for performing gesture recognition on a target object, each module having the same method as the method Since the same beneficial effects can be obtained by implementing the steps, the present invention will not be further described for the same parts.

구체적으로, 도 6에 도시된 바와 같이, 본 발명에서 제공하는 동작 장치는,Specifically, as shown in Figure 6, the operating device provided by the present invention,

비디오 클립을 획득하도록 구성된 비디오 획득 모듈(610); a video acquisition module 610 configured to acquire a video clip;

타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정하도록 구성된 동작 특징 결정 모듈(620);a motion characteristic determining module (620), configured for the target object to determine motion characteristic information of the target object based on an object frame in a key frame image in the video clip;

상기 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정하도록 구성된 시나리오 시계열 특징 결정 모듈(630); 및a scenario time series characteristic determining module 630, configured to determine scenario characteristic information and time series characteristic information corresponding to the target object based on the video clip and the motion characteristic information; and

상기 동작 특징 정보, 상기 시나리오 특징 정보 및 상기 시계열 특징 정보에 기반하여, 상기 타깃 대상의 동작 타입을 결정하도록 구성된 동작 인식 모듈(640)을 포함할 수 있다.and a motion recognition module 640 configured to determine the motion type of the target target based on the motion characteristic information, the scenario characteristic information, and the time series characteristic information.

일부 실시예에 있어서, 상기 동작 특징 결정 모듈(620)은 또한 키 프레임 이미지에서의 대상 틀을 결정하고;In some embodiments, the operation characteristic determining module 620 is further configured to determine a target frame in the key frame image;

선별하여 얻은 상기 키 프레임 이미지에 대해 대상 검출을 수행하여, 상기 타깃 대상이 상기 키 프레임 이미지에서의 초기 대상 경계 박스를 결정하고;performing object detection on the key frame image obtained by screening, so that the target object determines an initial object bounding box in the key frame image;

일부 실시예에 있어서, 상기 동작 특징 결정 모듈(620)은 타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정할 때,In some embodiments, the motion characteristic determining module 620 is configured to: when the target object determines motion characteristic information of the target object based on a target frame in a key frame image in the video clip,

상기 키 프레임 이미지를 대상으로 하여, 상기 비디오 클립에서 상기 키 프레임 이미지와 대응되는 복수 개 연관 이미지를 선별하고;selecting a plurality of related images corresponding to the key frame image from the video clip by using the key frame image;

상기 키 프레임 이미지에 대응되는 복수 개 타깃 대상 이미지에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정하도록 구성된다.and determine motion characteristic information of the target target based on a plurality of target target images corresponding to the key frame images.

일부 실시예에 있어서, 상기 동작 특징 결정 모듈(620)은 상기 비디오 클립에서 키 프레임 이미지와 대응되는 복수 개 연관 이미지를 선별할 때,In some embodiments, when the operation characteristic determining module 620 selects a plurality of related images corresponding to a key frame image in the video clip,

상기 비디오 클립에서 키 프레임 이미지를 포함하는 제1 서브 비디오 클립을 선택하고 - 상기 제1 서브 비디오 클립은 상기 키 프레임 이미지와 시계열에서 인접하는 N 개 이미지를 더 포함하며, N은 양의 정수임 - ;select from the video clip a first sub-video clip comprising a key frame image, wherein the first sub video clip further comprises N images contiguous in time series with the key frame image, where N is a positive integer;

일부 실시예에 있어서, 복수 개 타깃 대상 이미지를 얻은 다음, 상기 타깃 대상의 동작 특징 정보를 결정하기 전에, 상기 동작 특징 결정 모듈(620)은 또한,In some embodiments, after obtaining a plurality of target object images and before determining the operation characteristic information of the target object, the operation characteristic determining module 620 is further configured to:

일부 실시예에 있어서, 상기 시나리오 시계열 특징 결정 모듈(630)은 상기 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정할 때,In some embodiments, the scenario time series feature determination module 630 determines the scenario feature information and time series feature information corresponding to the target target based on the video clip and the motion feature information,

상기 비디오 클립에서의 타깃 대상을 제외한 다른 대상에 대해 시계열 특징 추출 동작을 수행하여, 초기 시계열 특징 정보를 얻고;performing a time-series feature extraction operation on an object other than the target object in the video clip to obtain initial time-series feature information;

상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 결정하도록 구성된다.and determine time-series characteristic information corresponding to the target target based on the initial time-series characteristic information and the operation characteristic information.

일부 실시예에 있어서, 상기 시나리오 시계열 특징 결정 모듈(630)은 상기 비디오 클립에서의 타깃 대상을 제외한 다른 대상에 대해 시계열 특징 추출 동작을 수행하여, 초기 시계열 특징 정보를 얻을 때,In some embodiments, the scenario time series feature determination module 630 performs a time series feature extraction operation on an object other than the target object in the video clip to obtain initial time series feature information,

상기 키 프레임 이미지를 대상으로 하여, 상기 비디오 클립에서 키 프레임 이미지를 포함하는 제2 서브 비디오 클립을 선택하고 - 상기 제2 서브 비디오 클립은 상기 키 프레임 이미지와 시계열에서 인접하는 P 개 이미지를 더 포함하며, P는 양의 정수임 - ; for the key frame image, select a second sub video clip including a key frame image in the video clip, the second sub video clip further comprising P images adjacent in time series to the key frame image and P is a positive integer - ;

일부 실시예에 있어서, 상기 시나리오 시계열 특징 결정 모듈(630)은 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 결정할 때,In some embodiments, when the scenario time series characteristic determination module 630 determines time series characteristic information corresponding to the target target based on the initial time series characteristic information and the operation characteristic information,

차원 축소 처리된 초기 시계열 특징 정보에 대해 평균값 풀링 동작을 수행하며;performing an average pooling operation on the initial time series feature information that has been dimensionally reduced;

일부 실시예에 있어서, 상기 시나리오 시계열 특징 결정 모듈(630)은 상기 초기 시계열 특징 정보 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시계열 특징 정보를 결정할 때 또한, In some embodiments, the scenario time series characteristic determination module 630 is further configured to determine time series characteristic information corresponding to the target target based on the initial time series characteristic information and the operation characteristic information,

본 발명의 실시예는 전자 기기를 개시하였고, 도 7에 도시된 바와 같이, 서로 연결된 프로세서(701) 및 저장 매체(702)를 포함하며, 상기 저장 매체에는 상기 프로세서가 실행 가능한 기기 판독 가능 명령어가 저장되고, 전자 기기가 작동될 때, 상기 프로세서는 상기 기기 판독 가능 명령어를 실행함으로써, 상기 동작 인식 방법의 단계를 실행한다. 구체적으로, 프로세서(701) 및 저장 매체(702)는 버스(703)를 통해 연결될 수 있다.An embodiment of the present invention discloses an electronic device, and as shown in FIG. 7 , includes a processor 701 and a storage medium 702 connected to each other, wherein the storage medium includes machine-readable instructions executable by the processor stored, and when the electronic device is operated, the processor executes the step of the method for recognizing the motion by executing the machine-readable instruction. Specifically, the processor 701 and the storage medium 702 may be connected via a bus 703 .

상기 기기 판독 가능 명령어가 상기 프로세서(701)에 의해 실행될 때 하기 동작 인식 방법의 단계를 실행한다.When the machine-readable instructions are executed by the processor 701, the following steps of the action recognition method are executed.

비디오 클립을 획득하고;acquire a video clip;

타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정하며;the target object determines motion characteristic information of the target object based on an object frame in a key frame image in the video clip;

상기 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정하고;determine scenario characteristic information and time series characteristic information corresponding to the target object based on the video clip and the motion characteristic information;

상기 동작 특징 정보, 상기 시나리오 특징 정보 및 상기 시계열 특징 정보에 기반하여, 상기 타깃 대상의 동작 타입을 결정한다.An operation type of the target is determined based on the operation characteristic information, the scenario characteristic information, and the time series characteristic information.

이 밖에, 기기 판독 가능 명령어가 프로세서(701)에 의해 실행될 때, 상기 방법 부분에서 설명한 어느 한 실시 형태에서의 방법 내용을 실행할 수도 있고, 여기서 더 이상 반복하지 않는다.In addition, when the machine-readable instructions are executed by the processor 701 , the method contents in any one embodiment described in the method section may be executed, which is not repeated herein any more.

본 발명의 실시예는 상기 방법 및 장치에 대응되는 컴퓨터 프로그램 제품을 더 제공하고, 프로그램 코드가 저장된 컴퓨터 판독 가능 저장 매체를 포함하며, 프로그램 코드에 포함되는 명령어는 앞의 방법 실시예에서의 방법을 실행하기 위한 것이고, 구체적인 구현은 방법 실시예를 참조할 수 있으며, 여기서 더 이상 반복하지 않는다. 상기 컴퓨터 판독 가능 저장 매체는 휘발성 또는 비휘발성 저장 매체일 수 있다.An embodiment of the present invention further provides a computer program product corresponding to the method and apparatus, comprising a computer readable storage medium storing the program code, and the instructions included in the program code include the method in the previous method embodiment. for implementation, and specific implementations may refer to method embodiments, which are not repeated herein any further. The computer-readable storage medium may be a volatile or non-volatile storage medium.

위의 각 실시예에 대한 설명은 각 실시예 사이의 차이점에 초점을 맞추고, 실시예의 동일하거나 유사한 부분에 대해서는 서로 참조할 수 있으며, 간결함을 위해 더 이상 설명하지 않는다.The description of each embodiment above focuses on the differences between the embodiments, and the same or similar parts of the embodiments may refer to each other, and will not be described further for the sake of brevity.

해당 분야의 기술자는 설명의 편의 및 간결함을 위해, 상기 설명된 시스템 및 장치의 구체적인 작업 과정은, 방법 실시예에서의 대응되는 과정을 참조할 수 있음을 명확하게 이해할 수 있고, 본 발명에서 더 이상 설명하지 않는다. 본 발명에서 제공하는 몇 개의 실시예에 있어서, 개시된 시스템, 장치 및 방법은 다른 형태로 구현될 수 있음을 이해해야 한다. 위에서 설명한 장치 실시예는 다만 예시적일 뿐이고, 예를 들어, 상기 모듈의 분할은 다만 논리적 기능 분할이며, 실제로 사실상 구현될 때에는 다른 분할 형태가 있을 수 있고, 또 예를 들어, 복수 개의 모듈 또는 컴포넌트는 다른 시스템에 결합 또는 통합될 수 있거나 또는 일부 특징을 무시하거나 실행하지 않을 수 있다. 또한, 표시되거나 논의된 상호 간의 커플링 또는 직접 커플링 또는 통신 연결은 일부 통신 인터페이스, 장치 또는 모듈을 통한 간접 커플링 또는 통신 연결일 수 있고, 전기적, 기계적 또는 다른 형태일 수 있다.Those skilled in the art can clearly understand that for the convenience and conciseness of the description, the specific working process of the system and apparatus described above may refer to the corresponding process in the method embodiment, and in the present invention, do not explain In the several embodiments provided herein, it should be understood that the disclosed systems, apparatuses, and methods may be embodied in other forms. The device embodiment described above is merely exemplary, for example, the division of the module is only logical function division, and there may be other division forms when actually implemented in practice, and for example, a plurality of modules or components It may be combined or integrated into other systems, or some features may be ignored or not implemented. Further, the mutual or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some communication interface, device or module, and may be in the form of electrical, mechanical, or other forms.

상기 분리 부재로서 설명된 모듈은 물리적으로 분리되거나, 분리되지 않을 수도 있고, 모듈로서 게시된 부재는 물리적 유닛일 수도, 아닐 수도 있으며, 즉 한곳에 위치할 수 있거나, 복수 개의 네트워크 유닛에 분포될 수도 있다. 실제 필요에 따라 그중의 일부 또는 전부를 선택하여 본 출원의 실시예 방안의 목적을 구현할 수 있다.The module described as the separation member may or may not be physically separated, and the member posted as a module may or may not be a physical unit, that is, it may be located in one place or may be distributed in a plurality of network units. . According to actual needs, some or all of them may be selected to implement the purpose of the embodiments of the present application.

또한, 본 발명의 각 실시예에서의 각 기능 유닛은 하나의 처리 유닛에 통합될 수 있고, 각 유닛이 단독 물리적으로 존재할 수도 있으며, 두 개 또는 두 개 이상의 유닛이 하나의 유닛에 통합될 수도 있다.In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist alone physically, and two or two or more units may be integrated into one unit. .

상기 기능이 만약 소프트웨어 기능 유닛의 형태로 구현되고 또한 단독적인 제품으로 판매되거나 사용될 때, 프로세서가 실행 가능한 비휘발성 컴퓨터 판독 가능한 저장 매체에 저장될 수 있다. 이러한 이해에 기반하여, 본 발명의 기술적 방안은 본질적으로 또는 기존 기술에 대해 기여하는 부분이나 상기 기술적 방안의 부분은 소프트웨어 제품의 형태로 구현될 수 있고, 상기 컴퓨터 소프트웨어 제품은 하나의 저장 매체에 저장되며, 컴퓨터 기기(개인용 컴퓨터, 서버, 또는 네트워크 기기 등일 수 있음)로 하여금 본 발명의 각 실시예에 따른 상기 방법의 전부 또는 부분 단계를 실행하도록 하는 몇 개의 명령어를 포함한다. 전술된 저장 매체는 USB 메모리, 외장 하드, 읽기 전용 메모리(ROM), 랜덤 액세스 메모리(RAM), 디스켓 또는 CD 등 프로그램 코드를 저장할 수 있는 다양한 매체를 포함한다.When the above functions are implemented in the form of a software function unit and sold or used as a standalone product, the function may be stored in a non-volatile computer-readable storage medium executable by the processor. Based on this understanding, the technical solution of the present invention essentially or a part contributing to the existing technology, but a part of the technical solution may be implemented in the form of a software product, and the computer software product is stored in one storage medium and includes several instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to execute all or partial steps of the method according to each embodiment of the present invention. The above-described storage medium includes various media capable of storing a program code, such as a USB memory, an external hard drive, a read-only memory (ROM), a random access memory (RAM), a diskette, or a CD.

이상은, 다만 본 발명의 구체적인 실시 방법일 뿐이고, 본 발명의 보호 범위는 이에 한정되지 않으며, 본 기술 분야에 익숙한 통상의 기술자라면 본 발명에서 개시된 기술적 범위 내의 변화 또는 교체가 모두 본 발명의 보호 범위 내에 속해야 함을 쉽게 알 수 있을 것이다. 따라서, 본 발명의 보호 범위는 청구 범위의 보호 범위를 기준으로 해야 한다.The above is only a specific implementation method of the present invention, the protection scope of the present invention is not limited thereto, and changes or replacements within the technical scope disclosed in the present invention are all within the protection scope of the present invention for those skilled in the art. It will be easy to see that it should belong within. Accordingly, the protection scope of the present invention should be based on the protection scope of the claims.

본 발명은 동작 인식 방법 및 장치, 전자 기기, 컴퓨터 판독 가능 저장 매체를 제공하고, 여기서, 상기 동작 인식 방법은, 비디오 클립을 획득하는 단계; 타깃 대상이 상기 비디오 클립에서의 키 프레임 이미지에서의 대상 틀에 기반하여, 상기 타깃 대상의 동작 특징 정보를 결정하는 단계; 비디오 클립 및 상기 동작 특징 정보에 기반하여, 상기 타깃 대상에 대응되는 시나리오 특징 정보 및 시계열 특징 정보를 결정하는 단계; 및 상기 동작 특징 정보, 상기 시나리오 특징 정보 및 상기 시계열 특징 정보에 기반하여, 상기 타깃 대상의 동작 타입을 결정하는 단계를 포함한다.The present invention provides a motion recognition method and apparatus, an electronic device, and a computer readable storage medium, wherein the motion recognition method includes: acquiring a video clip; determining, by the target object, motion characteristic information of the target object based on an object frame in a key frame image in the video clip; determining scenario characteristic information and time series characteristic information corresponding to the target object based on the video clip and the motion characteristic information; and determining an operation type of the target object based on the operation characteristic information, the scenario characteristic information, and the time series characteristic information.

Claims

A gesture recognition method comprising:
acquiring a video clip;
determining, by the target object, motion characteristic information of the target object based on an object frame in a key frame image in the video clip;
determining scenario characteristic information and time series characteristic information corresponding to the target object based on the video clip and the motion characteristic information; and
and determining an operation type of the target object based on the operation characteristic information, the scenario characteristic information, and the time series characteristic information.

According to claim 1,
Further comprising the step of determining a target frame in the key frame image, the step of determining the target frame in the key frame image,
selecting a key frame image from the video clip;
performing object detection on the key frame image obtained by screening, so that the target object determines an initial object bounding box in the key frame image; and
and performing expansion on the initial target bounding box according to preset expansion size information, so that the target target obtains the target frame in the key frame image.

3. The method of claim 1 or 2,
determining, by the target object, motion characteristic information of the target object based on an object frame in a key frame image in the video clip,
selecting a plurality of related images corresponding to the key frame image from the video clip by using the key frame image;
obtaining a plurality of target target images corresponding to the key frame images by cutting out partial images from at least partially related images corresponding to the key frame images according to the target frame corresponding to the key frame image; and
and determining motion characteristic information of the target target based on a plurality of target target images corresponding to the key frame image.

4. The method of claim 3,
The step of selecting a plurality of related images corresponding to the key frame image in the video clip comprises:
selecting from the video clip a first sub-video clip comprising a key frame image, wherein the first sub video clip further comprises N images contiguous in time series with the key frame image, where N is a positive integer; ; and
and selecting the plurality of related images from the first sub-video clip.

4. The method of claim 3,
After obtaining a plurality of target target images, before determining the motion characteristic information of the target target,
The method further comprising the step of setting the target target image as an image having a preset image resolution.

6. The method according to any one of claims 1 to 5,
Determining scenario characteristic information and time series characteristic information corresponding to the target target based on the video clip and the motion characteristic information,
selecting a plurality of related images corresponding to the key frame image from the video clip by using the key frame image;
at least partially performing a video scenario feature extraction operation on the associated image to obtain the scenario feature information;
obtaining initial time-series feature information by performing a time-series feature extraction operation on an object other than the target object in the video clip; and
and determining time-series characteristic information corresponding to the target object based on the initial time-series characteristic information and the operation characteristic information.

7. The method of claim 6,
The step of obtaining initial time-series feature information by performing a time-series feature extraction operation on an object other than the target object in the video clip,
for the key frame image, select a second sub video clip including a key frame image in the video clip, the second sub video clip further comprising P images adjacent in time series to the key frame image and P is a positive integer - ;
and extracting a motion characteristic of an object other than the target object from the image in the second sub-video clip, and using the obtained motion characteristic as the initial time-series characteristic information.

8. The method of claim 6 or 7,
The step of determining time series characteristic information corresponding to the target target based on the initial time series characteristic information and the operation characteristic information,
performing dimension reduction processing on the initial time series characteristic information and the operation characteristic information, respectively;
performing an average pooling operation on the initial time series feature information subjected to dimension reduction; and
and performing a merging operation on the initial time series feature information on which the average pooling operation has been performed and the dimension reduction-processed motion feature information to obtain time series feature information corresponding to the target object.

9. The method of claim 8,
The step of determining time series characteristic information corresponding to the target target based on the initial time series characteristic information and the operation characteristic information,
Using the obtained time-series characteristic information corresponding to the target object as new initial time-series characteristic information, and returning to the step of performing dimension reduction processing on the initial time-series characteristic information and the operation characteristic information, respectively motion recognition method.

A gesture recognition device comprising:
a video acquisition module configured to acquire a video clip;
a motion characteristic determining module, wherein the target object is configured to determine motion characteristic information of the target object based on an object frame in a key frame image in the video clip;
a scenario time-series characteristic determining module, configured to determine, based on the video clip and the motion characteristic information, scenario characteristic information and time-series characteristic information corresponding to the target object; and
and a motion recognition module configured to determine a motion type of the target object based on the motion characteristic information, the scenario characteristic information, and the time series characteristic information.

11. The method of claim 10,
the operation characteristic determining module is further configured to determine a target frame in the key frame image;
select a key frame image from the video clip;
performing object detection on the key frame image obtained by screening, so that the target object determines an initial object bounding box in the key frame image;
and performing expansion on the initial target bounding box according to preset expansion size information, so that the target target obtains the target frame in the key frame image.

12. The method of claim 10 or 11,
The motion characteristic determining module is configured to: when the target object determines motion characteristic information of the target object based on the target frame in the key frame image in the video clip,
selecting a plurality of related images corresponding to the key frame image from the video clip by using the key frame image;
cutting out partial images from at least partially related images corresponding to the key frame images according to the target frame corresponding to the key frame image, respectively, to obtain a plurality of target target images corresponding to the key frame image;
and determine motion characteristic information of the target target based on a plurality of target target images corresponding to the key frame images.

13. The method of claim 12,
When the operation characteristic determining module selects a plurality of related images corresponding to the key frame image in the video clip,
select from the video clip a first sub-video clip comprising a key frame image, wherein the first sub video clip further comprises N images contiguous in time series with the key frame image, where N is a positive integer;
and select the plurality of associated images from the first sub-video clip.

13. The method of claim 12,
After obtaining a plurality of target object images, and before determining the operation characteristic information of the target object, the operation characteristic determining module is further configured to:
and set the target target image to an image having a preset image resolution.

15. The method according to any one of claims 10 to 14,
When the scenario time series characteristic determination module determines the scenario characteristic information and time series characteristic information corresponding to the target target based on the video clip and the motion characteristic information,
selecting a plurality of related images corresponding to the key frame image from the video clip by using the key frame image;
at least partially performing a video scenario feature extraction operation on the associated image to obtain the scenario feature information;
performing a time-series feature extraction operation on an object other than the target object in the video clip to obtain initial time-series feature information;
and to determine time-series characteristic information corresponding to the target object based on the initial time-series characteristic information and the motion characteristic information.

16. The method of claim 15,
When the scenario time series feature determination module performs a time series feature extraction operation on an object other than the target object in the video clip to obtain initial time series feature information,
for the key frame image, select a second sub video clip including a key frame image in the video clip, the second sub video clip further comprising P images adjacent in time series to the key frame image and P is a positive integer - ;
and extracting a motion characteristic of an object other than the target object from the image in the second sub-video clip, and using the obtained motion characteristic as the initial time-series characteristic information.

17. The method of claim 15 or 16,
When the scenario time series characteristic determination module determines time series characteristic information corresponding to the target target based on the initial time series characteristic information and the operation characteristic information,
performing dimension reduction processing on the initial time series characteristic information and the operation characteristic information, respectively;
performing an average pooling operation on the initial time series feature information subjected to dimension reduction; and
The apparatus for recognizing motion, characterized in that it is configured to obtain time-series characteristic information corresponding to the target object by performing a merging operation on the initial time-series characteristic information on which the average pooling operation has been performed and the dimensionally-reduced operation characteristic information.

18. The method of claim 17,
When the scenario time series characteristic determination module determines the time series characteristic information corresponding to the target target based on the initial time series characteristic information and the operation characteristic information,
Operation characterized in that it is configured to use the obtained time series characteristic information corresponding to the target object as new initial time series characteristic information, and return to the step of performing dimension reduction processing on the initial time series characteristic information and the operation characteristic information, respectively. recognition device.

As an electronic device,
A processor and a storage medium connected to each other, wherein the storage medium stores machine-readable instructions executable by the processor, and when the electronic device is operated, the processor executes the machine-readable instructions, An electronic device for executing the gesture recognition method according to claim 9 .

A computer readable storage medium comprising:
A computer program is stored in the computer readable storage medium, and when the computer program is operated by a processor, the method for recognizing a motion according to any one of claims 1 to 9 is executed. storage medium.

A computer program comprising:
10. A method comprising computer readable code, wherein when the computer readable code is operated in an electronic device, a processor in the electronic device implements the method for recognizing a motion according to any one of claims 1 to 9. computer program to do.