KR102279958B1

KR102279958B1 - Method and Apparatus for Recognizing Animal State using Video and Sound

Info

Publication number: KR102279958B1
Application number: KR1020190086777A
Authority: KR
Inventors: 송병철
Original assignee: 인하대학교 산학협력단
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2021-07-21
Also published as: KR20210009778A; KR102279958B9

Abstract

영상과 소리를 이용한 동물 상태 인식 방법 및 장치가 제시된다. 본 발명에서 제안하는 영상과 소리를 이용한 동물 상태 인식을 위한 학습 방법은 타겟 동물군에 대해 동물의 행동, 표정, 소리를 포함하는 복수의 영상 데이터 및 소리 데이터를 수집하여 영상 데이터 및 소리 데이터 각각에 대해 해당하는 복수의 타겟 동물군의 상태를 결정하기 위해 어노테이션(annotation)을 수행하고, 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축하는 단계, 기계학습을 수행하기 위한 학습용 데이터 셋을 입력 받아 대응하는 GT들을 출력으로 결정하고, 출력된 GT들에 대한 가중치 값들 및 파라미터들을 학습하는 단계 및 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집하여 미리 기계학습된 분류기에 입력하고, 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산하여 가장 높은 상태값에 해당하는 상태를 동물의 상태로 결정하고, 결정된 동물의 상태를 사용자에게 전달하는 단계를 포함한다. A method and apparatus for recognizing animal states using images and sounds are presented. The learning method for animal state recognition using images and sounds proposed in the present invention collects a plurality of image data and sound data including an animal's behavior, expression, and sound for a target animal group, and stores the image data and sound data respectively. performing an annotation to determine the state of a plurality of target animal groups corresponding to each other, setting a prescribed state value to GT (groundtruth) using the annotation execution result, and constructing a data set for learning, a machine By receiving a training data set for performing learning, determining corresponding GTs as outputs, learning weight values and parameters for the output GTs, and collecting image data and sound data of animals to recognize the state It is input to the machine-learning classifier in advance, calculates a plurality of state values for the input image data and sound data, determines the state corresponding to the highest state value as the state of the animal, and delivers the determined state of the animal to the user. includes steps.

Description

Method and Apparatus for Recognizing Animal State using Video and Sound

본 발명은 영상과 소리를 이용한 동물 상태 인식 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for recognizing animal states using images and sounds.

오늘날 사람의 표정이나 감정을 인식하는 분야는 빠르게 발전하고 있고, 다양한 인물의 표정 정보를 얻는데 딥 러닝 기법이 사용되면서 보다 효율적으로 인물의 감정을 파악할 수 있게 되었다.Today, the field of recognizing human expressions or emotions is rapidly developing, and as deep learning techniques are used to obtain facial expression information of various people, it has become possible to more efficiently grasp the emotions of a person.

한편, 개나 고양이와 같은 반려 동물의 수는 급증하고 있고, 인간과 반려동물 간 소통이 날이 갈수록 중요해진다. PET TV나 원거리에서 반려동물을 모니터링하는 시스템들은 등장하고 있지만, 동물의 상태를 자동으로 인식하는 즉, 동물과 쉽게 소통하는 시스템은 개발된 바 없다.On the other hand, the number of companion animals such as dogs and cats is rapidly increasing, and communication between humans and companion animals becomes more important day by day. Although PET TV or systems for monitoring companion animals from a distance are emerging, no system has been developed that automatically recognizes the state of the animal, that is, communicates easily with the animal.

종래기술에 따르면, 사람들이 TV를 보며 스트레스를 푸는 것처럼, 도그 TV와 같은 PET TV는 반려 동물에게 동물 관련 영상물을 보여주면서 불안감을 해소시켜주는 기능을 하고 있다. According to the prior art, just as people relieve stress by watching TV, PET TV, such as a dog TV, functions to relieve anxiety while showing animal-related videos to companion animals.

일부 서비스는 주인과 반려동물이 상호 작용하는 기능을 제공하기도 한다. 예를 들어 집에 반려동물이 혼자 있을 때 원격 모니터를 통해 실시간으로 주인 얼굴을 보여주거나 목소리를 들려줌으로써 반려 동물을 안정시킨다.Some services also provide the ability for owners and pets to interact. For example, when the pet is alone in the house, it can be stabilized by showing the owner's face in real time through the remote monitor or by playing the voice.

너울정보라는 회사에서는 펫펄스라는 동물의 심박수 등을 이용한 IoT 인터페이스를 개발하였다. 그러나 동물의 목에 밀착해서 부착해야 하므로 조일 동물에게 불편을 줄 뿐만 아니라 심박수의 특성 상 다양한 동물 상태 인식은 사실상 불가능하다.A company called Swell Information has developed an IoT interface that uses the heart rate of animals called Pet Pulse. However, since it has to be attached closely to the animal's neck, it not only causes inconvenience to the animal, but also it is virtually impossible to recognize various animal states due to the characteristics of heart rate.

또한, 논문으로 동물의 뇌파 (Electroencephalogram; 이하 EEG)정보를 이용한 동물 상태 인식 연구가 이루어진 바 있다. 그러나 심박수보다 EEG는 더 획득하기 어렵고 고가이기 때문에 상용화가 어렵다. In addition, as a thesis, animal state recognition research using animal electroencephalogram (EEG) information has been conducted. However, EEG is more difficult to obtain and expensive than heart rate, making it difficult to commercialize.

반려 동물도 사람처럼 내면 상태가 존재하고, 이를 표정, 행동, 소리로 표출한다. 배고픔, 아픔, 기쁨, 슬픔 및 각종 생리현상을 표출하지만, 주인이 감으로 대응하는 것은 한계가 존재한다. 말로써 대화가 되지 않으므로 전문가가 아닌 이상 동물과의 소통이 쉽지 않다. Companion animals, like humans, have an inner state and express this through facial expressions, actions, and sounds. Hunger, pain, joy, sadness, and various physiological phenomena are expressed, but there is a limit to how the owner responds with his senses. It is not easy to communicate with animals unless you are an expert because you cannot communicate with words.

본 발명은 영상과 소리 즉, AV (Audio-Visual) 신호에 근거하여 반려 동물의 상태를 인식하는 시스템을 제안한다. The present invention proposes a system for recognizing the state of a companion animal based on an image and sound, that is, an AV (Audio-Visual) signal.

한국 공개특허공보 제10-2019-0028022호(2019.03.18.)Korean Patent Publication No. 10-2019-0028022 (2019.03.18.) 한국 등록특허공보 제10-1873926호(2018.07.04.)Korean Patent Publication No. 10-1873926 (2018.07.04.) 한국 등록특허공보 제10-1785888호(2017.10.17.)Korean Patent Publication No. 10-1785888 (2017.10.17.)

본 발명이 이루고자 하는 기술적 과제는 개나 고양이 같은 반려 동물의 수가 급증하고, 인간과 반려동물 간 소통이 날이 갈수록 중요해짐에 따라, 영상 센서와 마이크 등을 이용하여 대상 동물의 표정, 행동, 소리를 통해 그 동물의 상태를 자동으로 인식하여 그 동물과 소통하는 시스템을 제공하는데 있다. 본 발명은 영상과 소리 즉, AV(Audio-Visual) 신호에 근거하여 반려 동물의 상태를 인식하는 시스템을 제안한다. The technical problem to be achieved by the present invention is that as the number of companion animals such as dogs and cats increases rapidly, and communication between humans and companion animals becomes more important day by day, the expression, behavior, and sound of the target animal using an image sensor and a microphone, etc. The goal is to provide a system that automatically recognizes the state of the animal and communicates with the animal. The present invention proposes a system for recognizing the state of a companion animal based on an image and sound, that is, an AV (Audio-Visual) signal.

본 발명에서 제안하는 영상과 소리를 이용한 동물 상태 인식 방법을 위해서는 학습 과정이 선행되어야 한다. 학습을 위해서는 먼저 데이터 셋이 구성되어야 한다. 데이터 셋 구성 과정은 다음과 같다: 대상 동물과 유사한 동물들, 예를 들면 같은 종의 서로 다른 여러 동물들로부터 충분히 많은 영상 데이터와 소리 데이터를 쌍으로 수집한다. 편의상 동물의 상태는 N개라고 가정하자. 상태 별로 충분한 수의 데이터쌍들이 수집되었다고 가정한다. 각 (영상, 소리) 데이터의 상태를 전문가가 결정하는 단계를 어노테이션(annotation)이라 하며, 수집된 데이터들에 대해 이 작업을 수행한다. 학습용 데이터 셋에서 각 데이터에 규정된 상태를 GT (ground-truth)라고 한다.For the animal state recognition method using images and sounds proposed in the present invention, a learning process should be preceded. For learning, a data set must first be constructed. The data set construction process is as follows: Collect enough image data and sound data in pairs from animals similar to the target animal, for example, several different animals of the same species. For convenience, assume that there are N states of animals. It is assumed that a sufficient number of data pairs have been collected for each state. The step in which the expert determines the state of each (video, sound) data is called annotation, and this operation is performed on the collected data. In the training data set, the state specified for each data is called GT (ground-truth).

대상 동물을 위한 학습용 데이터 셋이 준비되었으면, 다음과 같은 학습 과정을 수행한다. 학습을 위해 SVM같은 전통적인 기계학습법을 사용할 수도 있고, CNN같은 딥러닝을 사용할 수도 있다. CNN을 예로 든다. CNN 기반 학습 과정은 상기 데이터 셋의 데이터들을 CNN에 입력하는 단계, 대응되는 상태값들 즉, GT들을 출력으로 정하는 단계, back propagation 같은 소정의 CNN 학습법을 통해 CNN의 가중치 값들과 각종 parameter들을 학습하는 단계로 구성된다. 학습 과정은 통상 off-line으로 미리 수행될 수 있다. 또한, 영상 데이터 셋 단독으로 상기 과정이 수행되는 것이 가능하다. 또한, 소리 데이터 셋 단독으로 상기 과정이 수행되는 것이 가능하다. 또한 영상, 소리 쌍을 하나의 데이터로 간주하여 상기 과정을 수행하는 것도 가능하다. When the training data set for the target animal is prepared, the following learning process is performed. Traditional machine learning methods such as SVMs can be used for training, or deep learning such as CNNs can be used. Take CNN as an example. The CNN-based learning process includes the steps of inputting the data of the data set into the CNN, determining the corresponding state values, that is, GTs as outputs, and learning the weight values and various parameters of the CNN through a predetermined CNN learning method such as back propagation. consists of steps. The learning process may be performed in advance off-line. In addition, it is possible that the above process is performed with an image data set alone. In addition, it is possible that the above process is performed with a sound data set alone. It is also possible to perform the above process by considering the image and sound pair as one data.

기계학습 기반 분류기를 이용하여 동물의 상태를 실시간으로 인식하는 시스템의 동작은 다음과 같다. 상태를 인식하고자 하는 대상 동물의 영상 데이터 및 소리 데이터를 획득하는 단계, 미리 기계학습된 분류기에 상기 획득된 영상 및 소리 데이터를 입력하는 단계, N가지 상태값을 계산하는 단계, 계산된 상태값 중에서 최선의 상태를 선택하는 단계, 그리고 사용자에게 소리나 영상 형태의 결과값을 전달하는 단계로 구성된다. The operation of the system for recognizing the state of an animal in real time using a machine learning-based classifier is as follows. Acquiring image data and sound data of a target animal for which state is to be recognized, inputting the obtained image and sound data to a pre-machine-learning classifier, calculating N kinds of state values, among the calculated state values It consists of a step of selecting the best state, and a step of delivering the result value in the form of sound or video to the user.

상기 영상 및 소리를 이용하여 상태를 인식하고자 하는 동물의 상태를 실시간으로 분류하는 단계는 영상 데이터만을 이용하여 해당 동물의 상태를 인식하거나, 또는 소리 데이터만을 이용하여 해당 동물의 상태를 인식하거나, 또는 영상 데이터 및 소리 데이터를 이용하여 해당 동물의 상태를 인식하는 것이 가능하다. The step of classifying the state of the animal whose state is to be recognized using the image and sound in real time may include recognizing the state of the animal using only image data, or recognizing the state of the animal using only sound data, or It is possible to recognize the state of the animal using image data and sound data.

상기 동물의 상태는 슬픔, 기쁨을 포함하는 감정 상태 및 배고픔, 배변을 포함하는 생리현상을 포함한다. 영상 정보에는 그에 대응하는 동물의 표정이나 행동을 표현되어 있고, 소리 정보에는 그에 대응하는 동물의 소리가 표현되어 있다.The state of the animal includes an emotional state including sadness and joy, and physiological phenomena including hunger and defecation. The image information expresses the expression or action of the animal corresponding thereto, and the sound information expresses the sound of the animal corresponding thereto.

또 다른 일 측면에 있어서, 본 발명에서 제안하는 영상과 소리를 이용한 동물 상태 인식 장치는 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 획득하는 센싱부, 획득된 영상 데이터 및 소리 데이터를 입력 받아 상태를 인식하고자 하는 동물의 상태를 실시간으로 분류하는 동물 상태 결정부, 결정된 동물의 상태를 음성이나 영상을 통해 사용자에게 전달하는 상태 전달부 등을 포함한다.In another aspect, the apparatus for recognizing an animal state using an image and sound proposed by the present invention includes a sensing unit that acquires image data and sound data of an animal to recognize a state, and receives the acquired image data and sound data. and an animal state determining unit that classifies the state of an animal whose state is to be recognized in real time, and a state transmitting unit that transmits the determined animal state to a user through voice or image.

본 발명은 영상과 소리 즉, AV(Audio-Visual) 신호에 근거하여 반려 동물의 상태를 인식할 수 있다. 본 발명은 스마트폰의 앱으로 개발되어 사용자가 실시간으로 동물과 소통할 수 있게 할 수 있다. 본 발명은 카메라, 마이크, 스피커, 디스플레이, 계산용 프로세서 등으로 구성된 실시간 동물 소통 전용 하드웨어로 구현될 수도 있다. 상기 장치를 이용하여 근거리에서 대상 동물을 촬영하면 실시간으로 사용자는 대상 동물의 상태를 파악할 수 있다.According to the present invention, the state of the companion animal can be recognized based on an image and sound, that is, an audio-visual (AV) signal. The present invention can be developed as an app for a smartphone so that a user can communicate with animals in real time. The present invention may be implemented as hardware dedicated to real-time animal communication including a camera, a microphone, a speaker, a display, and a processor for calculation. When the target animal is photographed at a short distance using the device, the user can grasp the state of the target animal in real time.

도 1은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식을 위한 학습용 데이터 셋 구축 방법을 설명하기 위한 흐름도이다.
도 2는 본 발명의 일 실시예에 따른 동물 상태 인식 방법의 개념도이다.
도 3은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식을 위한 학습 방법을 설명하기 위한 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 동물의 상태를 분류하는 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 방법을 설명하기 위한 흐름도이다.
도 6은 본 발명의 또 다른 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 방법을 설명하기 위한 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 장치의 구성을 나타내는 도면이다. 1 is a flowchart illustrating a method of constructing a learning data set for recognizing an animal state using images and sounds according to an embodiment of the present invention.
2 is a conceptual diagram of a method for recognizing an animal state according to an embodiment of the present invention.
3 is a flowchart illustrating a learning method for animal state recognition using images and sounds according to an embodiment of the present invention.
4 is a view for explaining a process of classifying the state of an animal according to an embodiment of the present invention.
5 is a flowchart illustrating a method for recognizing an animal state using images and sounds according to an embodiment of the present invention.
6 is a flowchart illustrating a method for recognizing an animal state using an image and sound according to another embodiment of the present invention.
7 is a diagram illustrating the configuration of an apparatus for recognizing animal states using images and sounds according to an embodiment of the present invention.

본 발명은 개나 고양이와 같은 동물의 상태를 분류하고 인식하는 방법을 제안한다. 동물도 행동, 표정, 소리를 통해 자신의 상태를 표현한다. 따라서 타겟 동물군에 대해 충분한 영상 및 소리 데이터를 획득하고 동물 전문가들을 통한 어노테이션(annotation)을 수행한다. 이를 통해 학습용 데이터 셋을 구축하고, 학습용 데이터는 해당 동물군의 상태를 한정된 개수로 분류된다. 학습 단계에서는 SVM(Support Vector Machines)이나 딥러닝 같은 기계학습을 이용하여 주어진 타켓 동물군의 학습용 데이터 셋으로 학습을 수행한다. 추론(Inference) 단계에서는 학습과 동일한 타겟 동물군에 속하는 임의의 동물로부터 실시간으로 영상과 소리를 획득한다. 획득된 영상과 소리는 상기 학습된 분류기로 입력되어 특정한 상태로 실시간 분류된다. 이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다.The present invention proposes a method for classifying and recognizing the status of animals such as dogs and cats. Animals also express their state through actions, expressions, and sounds. Therefore, sufficient image and sound data are obtained for the target animal group and annotation is performed by animal experts. Through this, a training data set is constructed, and the training data is classified into a limited number of states of the corresponding animal group. In the learning stage, using machine learning such as SVM (Support Vector Machines) or deep learning, learning is performed with a training data set for a given target animal group. In the inference stage, images and sounds are acquired in real time from any animal belonging to the same target animal group as the learning. The acquired image and sound are input to the learned classifier and classified in a specific state in real time. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식을 위한 학습용 데이터 셋 구축 방법을 설명하기 위한 흐름도이다. 1 is a flowchart illustrating a method of constructing a learning data set for recognizing an animal state using images and sounds according to an embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식을 위한 학습용 데이터 셋 구축 방법은 타겟 동물군에 대해 동물의 행동, 표정, 소리를 포함하는 복수의 영상 데이터 및 소리 데이터를 수집하는 단계(110), 수집된 영상 데이터 및 소리 데이터 각각에 대하여 해당하는 복수의 타겟 동물군의 상태를 결정하기 위해 어노테이션(annotation)을 수행하는 단계(120) 및 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축하는 단계(130)를 포함한다. The proposed method for constructing a learning data set for animal state recognition using images and sounds includes the steps of collecting a plurality of image data and sound data including the behavior, expression, and sound of animals for a target group of animals (110), the collected Performing annotation (120) to determine the state of a plurality of target animal groups corresponding to each of the image data and the sound data, and setting a prescribed state value to GT (groundtruth) using the annotation execution result and constructing a training data set (130).

단계(110)에서, 타겟 동물군에 대해 동물의 행동, 표정, 소리를 포함하는 복수의 영상 데이터 및 소리 데이터를 수집한다. 예를 들어, 영상 데이터의 센싱은 동물 주변의 고정된 카메라 또는 스마트폰과 같은 사용자가 휴대 가능한 영상획득 장치 등을 통해 가능하다. 소리 데이터의 센싱은 주변의 고정된 카메라에 내장된 마이크, 동물 신체에 부착된 형태의 마이크, 또는 스마트폰과 같은 사용자가 휴대 가능한 장치에 내장된 마이크 등을 통해 가능하다. In step 110 , a plurality of image data and sound data including an animal's behavior, expression, and sound are collected for a target group of animals. For example, sensing of image data is possible through a fixed camera around an animal or a user-portable image acquisition device such as a smartphone. Sensing of sound data is possible through a microphone built into a nearby fixed camera, a microphone attached to an animal body, or a microphone built into a user-portable device such as a smartphone.

단계(120)에서, 수집된 영상 데이터 및 소리 데이터 각각에 대하여 해당하는 복수의 타겟 동물군의 상태를 결정하기 위해 어노테이션(annotation)을 수행한다. 이후 단계(130)에서, 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축한다. 어노테이션(annotation)을 수행하여 타겟 동물군의 상태를 결정하기 위한 학습용 데이터 셋을 구축하고, 학습용 데이터 셋의 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 특정 종 별로 구성한다. In step 120, annotation is performed to determine the state of a plurality of target animal groups corresponding to each of the collected image data and sound data. Afterwards, in step 130 , a prescribed state value is set to GT (groundtruth) using the annotation execution result, and a training data set is constructed. Annotation is performed to construct a training data set for determining the state of the target animal group, set the prescribed state value of the training data set to GT (groundtruth), and configure the training data set for each specific species.

도 2는 본 발명의 일 실시예에 따른 동물 상태 인식 방법의 개념도이다. 2 is a conceptual diagram of a method for recognizing an animal state according to an embodiment of the present invention.

크게는 영상 기반 네트워크와 소리 기반의 네트워크로 나뉜다. 영상 기반 네트워크를 통해 영상 정보(210)를 수집하여 동물의 표정이나 행동 변화를 효율적으로 분석하며(230), 동시에 소리가 존재하는 경우 소리 정보(220)를 수집하고, 사운드 네트워크를 통해 추가로 분석한다(240). 이와 같이 영상 데이터 및 소리 데이터 두 가지 정보를 모두 활용하여 최종적인 동물의 상태를 결정한다(250). Broadly, it is divided into an image-based network and a sound-based network. Image information 210 is collected through an image-based network to efficiently analyze changes in an animal's expression or behavior (230), and sound information 220 is collected when sound is present at the same time, and further analyzed through a sound network do (240). As described above, the final animal state is determined by using both image data and sound data information ( 250 ).

도 3은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식을 위한 학습 방법을 설명하기 위한 흐름도이다.3 is a flowchart illustrating a learning method for animal state recognition using images and sounds according to an embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식을 위한 학습 방법은 기계학습을 수행하기 위한 학습용 데이터 셋을 입력 받는 단계(310), 입력 받은 학습용 데이터 셋에 대하여 대응하는 GT들을 출력으로 결정하는 단계(320) 및 출력된 GT들에 대한 가중치 값들 및 파라미터들을 학습하는 단계(330)를 포함한다. The proposed learning method for animal state recognition using images and sounds includes the steps of receiving an input of a learning data set for performing machine learning (310), and determining the GTs corresponding to the received learning data set as an output (320). ) and learning weight values and parameters for the output GTs ( 330 ).

앞서 설명된 바와 같이, 학습용 데이터 셋을 구축하기 위해 먼저 대상 동물과 같은 종의 동물들에 대해 충분히 많은 동영상 촬영(소리 포함)을 수행한다. 대상 동물 전문가를 활용하여 각 데이터 속 동물의 상태를 결정하는 어노테이션(annotation)을 수행한다. 획득한 학습용 데이터 셋의 규정된 상태값은 GT(groundtruth)로 설정한다. 동물마다 종마다 특성이 다르므로 학습용 데이터 셋은 특정 종 별로 구성하는 것이 바람직하다. 본 발명의 실시예에 따른 학습용 데이터 셋은 특정 종의 동물들의 영상이나 소리로 구성되며, 반드시 각 데이터는 하나의 상태로 어노테이션 되어 있어야 한다. As described above, in order to construct a training data set, sufficiently many video recordings (including sound) are performed for animals of the same species as the target animal. Annotation is performed to determine the state of the animal in each data by using a subject animal expert. The prescribed state value of the acquired training data set is set to GT (groundtruth). Since each animal has different characteristics for each species, it is desirable to configure the training data set for each specific species. A learning data set according to an embodiment of the present invention consists of images or sounds of animals of a specific species, and each data must be annotated in one state.

단계(310)에서, 기계학습을 수행하기 위한 학습용 데이터 셋을 입력 받는다. 기계학습을 위해서는 충분한 양의 영상 데이터 및 소리 데이터들이 요구된다. 예를 들어, 대표적인 반려동물인 개 종 중 치와와를 가정하자. 또한 대상 동물 즉 치와와의 상태는 N개라고 가정하자. In step 310, a training data set for performing machine learning is received. A sufficient amount of image data and sound data is required for machine learning. For example, consider the Chihuahua, one of the dog breeds that is a representative companion animal. Also, suppose that there are N states of the target animal, that is, the Chihuahua.

단계(320)에서, 입력 받은 학습용 데이터 셋에 대하여 대응하는 GT들을 출력으로 결정한다. In step 320, GTs corresponding to the received training data set are determined as outputs.

구성된 학습용 데이터 셋을 활용하여 동물 상태 분류를 학습한다. 기계학습을 수행하기 위해 학습용 데이터 셋의 각 입력에 대응하는 GT를 출력으로 결정한다. The animal state classification is learned using the configured training data set. To perform machine learning, a GT corresponding to each input of the training data set is determined as an output.

단계(330)에서, 최적화 과정을 거쳐 해당 기계학습 모듈 내 가중치 값들 혹은 파라미터들을 학습한다. In step 330, weight values or parameters in the corresponding machine learning module are learned through an optimization process.

예를 들어, 영상 파트는 전형적인 HOG + SVM 모듈로 구성되어 학습될 수도 있고, RESNET이나 DenseNet같은 CNN으로 구성되어 학습될 수도 있다. 영상 한 장이 아닌 동영상을 활용할 경우 CNN + LSTM 구조가 될 수도 있다. For example, the image part may be trained by configuring a typical HOG + SVM module, or may be trained by being configured with a CNN such as RESNET or DenseNet. If you use a video instead of a single video, it can be a CNN + LSTM structure.

영상 데이터 및 소리 데이터 각각에서 타겟 동물군의 상태를 결정하기 위한 영상 특징 및 소리 특징을 추출하고, 추출된 영상 특징 및 소리 특징을 이용하여 타겟 동물군의 상태를 분류한다. An image feature and a sound feature for determining the state of the target animal group are extracted from each of the image data and the sound data, and the state of the target animal group is classified using the extracted image feature and sound feature.

이때, 학습용 데이터 셋의 영상 데이터만을 이용하여 기계학습을 수행하거나, 또는 학습용 데이터 셋의 소리 데이터만을 이용하여 기계학습을 수행하거나, 또는 학습용 데이터 셋의 영상 데이터 및 소리 데이터를 이용하여 기계학습을 수행할 수 있다. In this case, machine learning is performed using only the image data of the training dataset, or machine learning is performed using only the sound data of the training dataset, or machine learning is performed using the image data and sound data of the training dataset. can do.

도 4는 본 발명의 일 실시예에 따른 동물의 상태를 분류하는 과정을 설명하기 위한 도면이다. 4 is a view for explaining a process of classifying the state of an animal according to an embodiment of the present invention.

영상 파트(430)는 영상 특징 추출(431)과 분류(432)로 구분된다. 영상 특징 추출(431)은 영상을 입력(410) 받아, HoG, LBP, Haar와 같은 핸드-크래프트 피쳐(hand-crafted feature)를 추출할 수도 있고, CNN같은 뉴럴네트워크를 사용할 수도 있다. The image part 430 is divided into image feature extraction 431 and classification 432 . The image feature extraction 431 may receive an image 410 and extract hand-crafted features such as HoG, LBP, and Haar, or may use a neural network such as CNN.

분류(432)는 입력된 특징들을 SVM, 랜덤 포레스트(random forest)와 같은 기계학습법으로 분류한다. The classification 432 classifies the input features using a machine learning method such as SVM or a random forest.

영상 특징 추출(431) 및 분류(432)를 한 번에 통합한 CNN, RNN, LSTM 등을 사용할 수도 있다. A CNN, RNN, LSTM, etc. that integrate image feature extraction 431 and classification 432 at once may be used.

소리 파트(440)는 소리 특징 추출(441)과 분류(442)로 구분된다. 소리 특징 추출(441)은 소리를 입력(420) 받아, HoG, LBP, Haar와 같은 핸드-크래프트 피쳐(hand-craft feature)를 추출할 수도 있고, CNN같은 뉴럴네트워크를 사용할 수도 있다. 소리 파트(440)도 입력만 달라질 뿐 영상 파트(430)와 방식은 유사하다. 다만, 소리의 1차원적 특성을 고려하여 CNN 대신 DNN, RNN, LSTM, GRU 등이 사용될 수 있다.The sound part 440 is divided into a sound feature extraction 441 and a classification 442 . The sound feature extraction 441 may receive a sound input 420, extract hand-craft features such as HoG, LBP, and Haar, or use a neural network such as CNN. The sound part 440 is similar in method to the image part 430 except that only the input is different. However, in consideration of the one-dimensional characteristics of sound, DNN, RNN, LSTM, GRU, etc. may be used instead of CNN.

한편, 영상 파트(430) 기계학습 결과와 소리 파트(440) 기계학습 결과에 따른 영상 스코어(450) 및 소리 스코어(460)를 퓨전(fusion)(470)하여 최종 분류 결과(480)를 낸다. 퓨전(470)은 보통 영상 스코어(450) 및 소리 스코어(460)들을 가중치 평균한다.Meanwhile, a final classification result 480 is generated by fusion 470 of the image part 430 machine learning result and the image score 450 and sound score 460 according to the sound part 440 machine learning result. Fusion 470 usually weights averages image score 450 and sound score 460 .

본 발명의 실시예에 따르면, 영상만 사용하여 동물의 상태를 분류할 수 있다. 영상만 사용하는 경우, 도 4에서 학습된 영상 파트만 이용하여, 즉 카메라 출력 중 영상만 이용하여 동물의 상태를 감지할 수 있다. 예를 들어, 카메라는 CCTV처럼 동물 주변에 고정된 카메라가 될 수도 있으며, 스마트폰과 같이 움직이는 카메라가 될 수도 있다. According to an embodiment of the present invention, it is possible to classify the state of an animal using only an image. In the case of using only the image, the state of the animal may be detected using only the image part learned in FIG. 4 , that is, using only the image from the camera output. For example, the camera may be a fixed camera around an animal like CCTV, or a moving camera like a smartphone.

본 발명의 또 다른 실시예에 따르면, 소리만 사용하여 동물의 상태를 분류할 수 있다. 소리만 사용하는 경우, 도 4에서 학습된 소리 파트만 이용하여, 즉 마이크 출력만을 이용하여 동물의 상태를 감지할 수도 있다. 예를 들어, 마이크는 동물에 부착이 될 수도 있고, 주변에 고정되어 있을 수도 있으며, 스마트폰에 내장된 마이크가 될 수도 있다. According to another embodiment of the present invention, it is possible to classify the state of an animal using only sound. When only sound is used, the state of the animal may be detected using only the sound part learned in FIG. 4 , that is, using only the microphone output. For example, the microphone may be attached to an animal, may be fixed around it, or may be a microphone built into a smartphone.

본 발명의 또 다른 실시예에 따르면, 영상 및 소리 모두를 사용하여 동물의 상태를 분류할 수 있다. 영상 및 소리 모두를 사용하는 경우가 가장 일반적이며 영상만 사용하는 경우와 소리만 사용하는 경우를 합쳐놓은 것과 같다. According to another embodiment of the present invention, it is possible to classify the state of an animal using both an image and a sound. The case of using both video and sound is the most common, and it is the same as combining the case of using only the image and the case of using only the sound.

도 5는 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 방법을 설명하기 위한 흐름도이다. 5 is a flowchart illustrating a method for recognizing an animal state using images and sounds according to an embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식 방법은 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집하는 단계(510), 미리 기계학습된 분류기에 수집된 영상 데이터 및 소리 데이터를 입력하는 단계(520), 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산하는 단계(530), 계산된 상태값들 중 가장 높은 상태값에 해당하는 상태를 동물의 상태로 결정하는 단계(540) 및 결정된 동물의 상태를 사용자에게 전달하는 단계(550)를 포함한다. The proposed method for recognizing an animal state using an image and sound includes the steps of collecting image data and sound data of an animal for which the state is to be recognized (510), and inputting the collected image data and sound data into a pre-machine-learned classifier ( 520), calculating a plurality of state values for the input image data and sound data (530), determining a state corresponding to the highest state value among the calculated state values as an animal state (540), and and transmitting (550) the determined state of the animal to the user.

단계(510)에서, 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집한다. 예를 들어, 영상 데이터의 센싱은 동물 주변의 고정된 카메라 또는 스마트폰과 같은 사용자가 휴대 가능한 영상획득 장치 등을 통해 가능하다. 소리 데이터의 센싱은 주변의 고정된 카메라에 내장된 마이크, 동물 신체에 부착된 형태의 마이크, 또는 스마트폰과 같은 사용자가 휴대 가능한 장치에 내장된 마이크 등을 통해 가능하다. In step 510, image data and sound data of an animal whose state is to be recognized are collected. For example, sensing of image data is possible through a fixed camera around an animal or a user-portable image acquisition device such as a smartphone. Sensing of sound data is possible through a microphone built into a nearby fixed camera, a microphone attached to an animal body, or a microphone built into a user-portable device such as a smartphone.

단계(520)에서, 미리 기계학습된 분류기에 수집된 영상 데이터 및 소리 데이터를 입력한다. In step 520, the collected image data and sound data are input to the machine-learning classifier in advance.

단계(530)에서, 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산한다. In step 530, a plurality of state values for the input image data and sound data are calculated.

단계(540)에서, 계산된 상태값들 중 가장 높은 상태값에 해당하는 상태를 동물의 상태로 결정한다. 미리 기계학습된 분류기의 학습용 데이터 셋을 활용하여 영상 데이터 및 소리 데이터에 대한 영상 스코어 및 소리 스코어를 가중치 평균하고, 가장 스코어가 높은 상태값을 선택한다. 선택된 상태값에 해당하는 상태를 동물의 상태로 결정한다. In step 540, a state corresponding to the highest state value among the calculated state values is determined as the state of the animal. By using the training data set of the machine-learning classifier in advance, the image score and the sound score for the image data and the sound data are weighted averaged, and a state value with the highest score is selected. The state corresponding to the selected state value is determined as the state of the animal.

이때, 상태를 인식하고자 하는 동물의 영상 데이터만을 이용하여 해당 동물의 상태를 인식하거나, 또는 소리 데이터만을 이용하여 해당 동물의 상태를 인식하거나, 또는 영상 데이터 및 소리 데이터를 이용하여 해당 동물의 상태를 인식한다. In this case, the state of the animal is recognized using only the image data of the animal whose state is to be recognized, or the state of the animal is recognized using only the sound data, or the state of the animal is recognized using the image data and sound data. Recognize.

단계(550)에서 결정된 동물의 상태를 사용자에게 전달한다. 동물의 상태를 음성 또는 영상을 통해 사용자에게 전달할 수 있다. 동물의 상태는 슬픔, 기쁨을 포함하는 감정 상태 및 배고픔, 배변을 포함하는 생리현상을 포함하고, 영상 데이터에는 동물의 상태에 대응하는 동물의 표정, 행동이 표현되어 있고, 소리 데이터에는 동물의 상태에 대응하는 동물의 소리가 표현되어 있다. The state of the animal determined in step 550 is transmitted to the user. The state of the animal may be transmitted to the user through voice or video. The animal state includes an emotional state including sadness and joy, and physiological phenomena including hunger and defecation. The image data expresses the animal's expression and behavior corresponding to the state of the animal, and the sound data contains the animal state. The corresponding animal sounds are expressed.

도 6은 본 발명의 또 다른 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 방법을 설명하기 위한 흐름도이다.6 is a flowchart illustrating a method for recognizing an animal state using images and sounds according to another embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식 방법은 타겟 동물군에 대해 동물의 행동, 표정, 소리를 포함하는 복수의 영상 데이터 및 소리 데이터를 수집하여 영상 데이터 및 소리 데이터 각각에 대해 해당하는 복수의 타겟 동물군의 상태를 결정하기 위해 어노테이션(annotation)을 수행하고, 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축하는 단계(610), 기계학습을 수행하기 위한 학습용 데이터 셋을 입력 받아 대응하는 GT들을 출력으로 결정하고, 출력된 GT들에 대한 가중치 값들 및 파라미터들을 학습하는 단계(620) 및 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집하여 미리 기계학습된 분류기에 입력하고, 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산하여 가장 높은 상태값에 해당하는 상태를 동물의 상태로 결정하고, 결정된 동물의 상태를 사용자에게 전달하는 단계(630)를 포함한다. The proposed animal state recognition method using image and sound collects a plurality of image data and sound data including the behavior, expression, and sound of an animal for a target animal group, and collects a plurality of targets corresponding to each of the image data and sound data. Annotation is performed to determine the state of the animal group, a state value defined using the annotation execution result is set to GT (groundtruth), and a data set for learning is constructed (610), and machine learning is performed. In step 620 of receiving a training data set for learning, determining corresponding GTs as outputs, learning weight values and parameters for the output GTs, and collecting image data and sound data of animals to recognize the state It is input to the machine-learning classifier in advance, calculates a plurality of state values for the input image data and sound data, determines the state corresponding to the highest state value as the state of the animal, and delivers the determined state of the animal to the user. Step 630 is included.

단계(610)에서, 타겟 동물군에 대해 동물의 행동, 표정, 소리를 포함하는 복수의 영상 데이터 및 소리 데이터를 수집하여 영상 데이터 및 소리 데이터 각각에 대해 해당하는 복수의 타겟 동물군의 상태를 결정하기 위해 어노테이션(annotation)을 수행하고, 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축한다. In step 610, a plurality of image data and sound data including an animal's behavior, expression, and sound are collected for the target animal group, and the state of a plurality of target animal groups corresponding to each of the image data and sound data is determined. To do this, annotation is performed, a prescribed state value is set to GT (groundtruth) using the annotation execution result, and a training data set is constructed.

예를 들어, 영상 데이터의 센싱은 동물 주변의 고정된 카메라 또는 스마트폰과 같은 사용자가 휴대 가능한 영상획득 장치 등을 통해 가능하다. 소리 데이터의 센싱은 주변의 고정된 카메라에 내장된 마이크, 동물 신체에 부착된 형태의 마이크, 또는 스마트폰과 같은 사용자가 휴대 가능한 장치에 내장된 마이크 등을 통해 가능하다. For example, sensing of image data is possible through a fixed camera around an animal or a user-portable image acquisition device such as a smartphone. Sensing of sound data is possible through a microphone built into a nearby fixed camera, a microphone attached to an animal body, or a microphone built into a user-portable device such as a smartphone.

이후, 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축한다. 어노테이션(annotation)을 수행하여 타겟 동물군의 상태를 결정하기 위한 학습용 데이터 셋을 구축하고, 학습용 데이터 셋의 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 특정 종 별로 구성한다. Thereafter, a prescribed state value is set to GT (groundtruth) using the annotation execution result, and a training data set is constructed. Annotation is performed to construct a training data set for determining the state of the target animal group, set the prescribed state value of the training data set to GT (groundtruth), and configure the training data set for each specific species.

단계(620)에서 기계학습을 수행하기 위한 학습용 데이터 셋을 입력 받아 대응하는 GT들을 출력으로 결정하고, 출력된 GT들에 대한 가중치 값들 및 파라미터들을 학습한다. In step 620 , a training data set for performing machine learning is received, corresponding GTs are determined as outputs, and weight values and parameters for the output GTs are learned.

기계학습을 위해서는 충분한 양의 영상 데이터 및 소리 데이터들이 요구된다. 예를 들어, 대표적인 반려동물인 개 종 중 치와와를 가정하자. 또한 동물의 상태는 N개라고 가정하자. 구성된 학습용 데이터 셋을 활용하여 동물 상태 분류를 학습한다. 기계학습을 수행하기 위해 학습용 데이터 셋을 입력 받아 대응하는 GT들을 출력으로 결정한다. A sufficient amount of image data and sound data is required for machine learning. For example, consider the Chihuahua, one of the dog breeds that is a representative companion animal. Also, suppose that there are N animal states. The animal state classification is learned using the configured training data set. In order to perform machine learning, a training data set is received as an input and corresponding GTs are determined as outputs.

기계학습은 예를 들어, 영상 파트의 경우 전형적인 HOG + SVM을 학습할 수도 있고, RESNET이나 DenseNet같은 CNN을 학습할 수도 있다. 영상 한 장이 아닌 동영상을 활용할 경우 CNN + LSTM 구조가 될 수도 있다. Machine learning can learn a typical HOG + SVM for video parts, for example, or a CNN such as RESNET or DenseNet. If you use a video instead of a single video, it can be a CNN + LSTM structure.

영상 데이터 및 소리 데이터 각각에서 타겟 동물군의 상태를 결정하기 위한 영상 특징 및 소리 특징을 추출하고, 추출된 영상 특징 및 소리 특징을 이용하여 타겟 동물군의 상태를 분류한다. An image feature and a sound feature for determining the state of the target animal group are extracted from each of the image data and the sound data, and the state of the target animal group is classified using the extracted image feature and the sound feature.

단계(630)에서 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집하여 미리 기계학습된 분류기에 입력하고, 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산하여 가장 높은 상태값에 해당하는 상태를 동물의 상태로 결정하고, 결정된 동물의 상태를 사용자에게 전달한다. In step 630, the image data and sound data of the animal for which the state is to be recognized are collected and input to the machine-learning classifier in advance, and a plurality of state values for the input image data and sound data are calculated to obtain the highest state value. The corresponding state is determined as the state of the animal, and the determined state of the animal is transmitted to the user.

미리 기계학습된 분류기의 학습용 데이터 셋을 활용하여 영상 데이터 및 소리 데이터에 대한 영상 스코어 및 소리 스코어를 가중치 평균하고, 가장 스코어가 높은 상태값을 선택한다. 선택된 상태값에 해당하는 상태를 동물의 상태로 결정한다. By using the training data set of the machine-learned classifier in advance, the image score and the sound score for the image data and sound data are weighted averaged, and a state value with the highest score is selected. The state corresponding to the selected state value is determined as the state of the animal.

동물의 상태는 슬픔, 기쁨을 포함하는 감정 상태 및 배고픔, 배변을 포함하는 생리현상을 포함하고, 영상 데이터에는 동물의 상태에 대응하는 동물의 표정, 행동이 표현되어 있고, 소리 데이터에는 동물의 상태에 대응하는 동물의 소리가 표현되어 있다. The animal state includes an emotional state including sadness and joy, and physiological phenomena including hunger and defecation. The image data expresses the animal's expression and behavior corresponding to the state of the animal, and the sound data contains the animal state. The corresponding animal sounds are expressed.

도 7은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 장치의 구성을 나타내는 도면이다. 7 is a diagram showing the configuration of an apparatus for recognizing animal states using images and sounds according to an embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식 장치는 센싱부(710), 동물 상태 결정부(720) 및 상태 전달부(730)를 포함한다. The proposed animal state recognition apparatus using images and sounds includes a sensing unit 710 , an animal state determining unit 720 , and a state transmitting unit 730 .

센싱부(710)는 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집한다. 예를 들어, 영상 데이터의 센싱은 동물 주변의 고정된 카메라 또는 스마트폰과 같은 사용자가 휴대 가능한 영상획득 장치 등을 통해 가능하다. 소리 데이터의 센싱은 주변의 고정된 카메라에 내장된 마이크, 동물 신체에 부착된 형태의 마이크, 또는 스마트폰과 같은 사용자가 휴대 가능한 장치에 내장된 마이크 등을 통해 가능하다. The sensing unit 710 collects image data and sound data of an animal whose state is to be recognized. For example, sensing of image data is possible through a fixed camera around an animal or a user-portable image acquisition device such as a smartphone. Sensing of sound data is possible through a microphone built into a nearby fixed camera, a microphone attached to an animal body, or a microphone built into a user-portable device such as a smartphone.

상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 센싱부를 통해 수집하고, 센싱부와 연동된 CPU 혹은 GPU를 통해 동물의 상태를 실시간으로 분류하기 위한 연산을 수행한다. 센싱부에서 수집된 영상 데이터 및 소리 데이터의 전달은 무선이나 유선 모두 가능하다. 이와 같이 동물의 상태를 실시간으로 분류하는 것은 스마트폰 앱으로도 개발 가능하다.The image data and sound data of the animal for which the state is to be recognized are collected through the sensing unit, and an operation is performed to classify the state of the animal in real time through the CPU or GPU interlocked with the sensing unit. The image data and sound data collected by the sensing unit may be transmitted wirelessly or wired. This real-time classification of animal states can also be developed as a smartphone app.

동물 상태 결정부(720)는 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집하여 미리 기계학습된 분류기에 입력하고, 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산하여 가장 높은 값에 해당하는 상태값을 선택하여 동물의 상태로 결정한다. The animal state determiner 720 collects image data and sound data of an animal for which state is to be recognized, inputs it to a pre-machine-learning classifier, calculates a plurality of state values for the input image data and sound data, and calculates the highest The state value corresponding to the value is selected to determine the state of the animal.

미리 기계학습된 분류기의 학습용 데이터 셋을 활용하여 영상 데이터 및 소리 데이터에 대한 영상 스코어 및 소리 스코어를 가중치 평균하고, 가장 스코어가 높은 상태값을 선택한다. By using the training data set of the machine-learned classifier in advance, the image score and the sound score for the image data and sound data are weighted averaged, and a state value with the highest score is selected.

학습용 데이터 셋의 영상 파트 기계학습 결과와 소리 파트 기계학습 결과에 따른 영상 스코어 및 소리 스코어를 가중치 평균한다. 예를 들어, 퓨젼(Fusion) 결과 N개의 최종 스코어(score)들이 나온다. 가장 스코어가 높은 상태를 그 동물의 상태로 결정한다.The image score and sound score according to the image part machine learning result and the sound part machine learning result of the training data set are weighted averaged. For example, Fusion results in N final scores. The state with the highest score is determined as the state of the animal.

상태 전달부(730)는 결정된 동물의 상태를 사용자에게 전달한다. 슬픔, 기쁨을 포함하는 감정 상태 및 배고픔, 배변을 포함하는 생리현상을 포함하는 동물의 상태를 음성 또는 영상을 통해 사용자에게 전달한다. The state transfer unit 730 transmits the determined state of the animal to the user. The state of the animal including the emotional state including sadness and joy and the physiological phenomenon including hunger and defecation is delivered to the user through voice or video.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and carry out program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

For each of the plurality of image data and sound data including the behavior, expression, and sound of the animal for the target animal group collected through the sensing unit, an annotation ( annotation), setting a prescribed state value to GT (groundtruth) using the annotation execution result, and constructing a data set for learning;
receiving, by an animal state determining unit, a learning data set for performing machine learning, determining corresponding GTs as outputs, and learning weight values and parameters for the output GTs; and
By using the image data and sound data of the animal to recognize the state collected through the sensing unit, it is input to the machine-learned classifier through the animal state determiner, and a plurality of state values for the input image data and sound data are calculated. determining the state corresponding to the highest state value as the state of the animal, and transmitting the determined state of the animal to the user through the state transmitting unit
including,
The step of the animal state determining unit receiving a learning data set for performing machine learning as an input, determining corresponding GTs as outputs, and learning weight values and parameters for the output GTs includes:
Using the training dataset, perform CNN machine learning including HOG and SVM, RESNET, and DenseNet for image data, or perform machine learning with CNN and LSTM structure for video data,
Image features and sound features for determining the state of the target animal group are extracted from each of the image data and sound data, and image data machine learning results and sound data machine learning results using the extracted image features and sound features image scores and By averaging the sound scores by weighted average, fusion is performed to generate the final classification result.
A method for recognizing animal states using images and sounds.

According to claim 1,
By using the image data and sound data of the animal to recognize the state collected through the sensing unit, it is input to the machine-learned classifier through the animal state determiner, and a plurality of state values for the input image data and sound data are calculated. The step of determining the state corresponding to the highest state value as the state of the animal, and transmitting the determined state of the animal to the user through the state transmitting unit,
By using the training data set of the machine-learning classifier in advance, the image and sound scores for the image data and sound data are weighted averaged, and the state value with the highest score is selected.
Recognizing the state of the animal using only the image data of the animal whose state is to be recognized, or recognizing the state of the animal using only sound data, or recognizing the state of the animal using image data and sound data
A method for recognizing animal states using images and sounds.

According to claim 1,
The animal state includes an emotional state including sadness and joy, and physiological phenomena including hunger and defecation. The image data expresses the animal's expression and behavior corresponding to the state of the animal, and the sound data contains the animal state. animal sounds corresponding to
A method for recognizing animal states using images and sounds.

a sensing unit for collecting image data and sound data of an animal to recognize a state;
The image data and sound data of the animal for which the state is to be recognized are collected and input to a machine-learning classifier in advance, and a plurality of state values for the input image data and sound data are calculated and the state value corresponding to the highest value is selected. an animal state determination unit to determine the state of the animal; and
A state transmitting unit that transmits the determined state of the animal to the user
including,
The animal condition determination unit,
For each of the plurality of image data and sound data including the behavior, expression, and sound of the animal for the target animal group collected through the sensing unit, an annotation ( annotation), set the prescribed state value to GT (groundtruth) using the annotation execution result, build a data set for learning,
Receives a training data set for performing machine learning as an input, determines corresponding GTs as outputs, learns weight values and parameters for the output GTs,
Using the image data and sound data of the animal to recognize the state collected through the sensing unit, it is input to the machine-learning classifier in advance, and a plurality of state values for the input image data and sound data are calculated to obtain the highest state value. determining the corresponding condition as the condition of the animal;
Using the training dataset, perform CNN machine learning including HOG and SVM, RESNET, and DenseNet for image data, or perform machine learning with CNN and LSTM structure for video data,
Image features and sound features for determining the state of the target animal group are extracted from each of the image data and sound data, and image data machine learning results and sound data machine learning results using the extracted image features and sound features image scores and By averaging the sound scores by weighted average, fusion is performed to generate the final classification result.
Animal state recognition device using image and sound.

5. The method of claim 4,
The animal condition determination unit,
By using the training data set of the machine-learning classifier in advance, the image and sound scores for the image data and sound data are weighted averaged, and the state value with the highest score is selected.
Recognizing the state of the animal using only the image data of the animal whose state is to be recognized, or recognizing the state of the animal using only sound data, or recognizing the state of the animal using image data and sound data
Animal state recognition device using image and sound.

5. The method of claim 4,
The state transfer unit,
It is a system that conveys the state of animals, including emotional states including sadness and joy, and physiological phenomena including hunger and defecation, to users through audio or video.
Animal state recognition device using image and sound.

delete