KR20210009778A

KR20210009778A - Method and Apparatus for Recognizing Animal State using Video and Sound

Info

Publication number: KR20210009778A
Application number: KR1020190086777A
Authority: KR
Inventors: 송병철
Original assignee: 인하대학교 산학협력단
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2021-01-27
Also published as: KR102279958B1; KR102279958B9

Abstract

Disclosed are a method and an apparatus for recognizing animal conditions by using images and sounds. The method for recognizing animal conditions by using images and sounds comprises the steps of: collecting pluralities of pieces of image and sound data including the behavior, expression, and sound of animals of a target animal group, and performing annotation to determine the condition of a plurality of target animal groups corresponding to each of the pieces of the image and sound data, setting predetermined state values to groundtruth (GT) by using the results of the annotation, and constructing a training data set; receiving the training data set for performing machine learning, determining corresponding GTs as outputs, and learning weight values and parameters for the output GTs; and collecting image and sound data of animals of which the condition is to be recognized, inputting the same to a machine-learned classifier in advance, calculating a plurality of condition values for the input image and sound data to determine a condition corresponding to the highest condition value as the condition of the animals, and transmitting the determined condition of the animals to the user.

Description

Method and Apparatus for Recognizing Animal State using Video and Sound}

본 발명은 영상과 소리를 이용한 동물 상태 인식 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for recognizing animal conditions using images and sounds.

오늘날 사람의 표정이나 감정을 인식하는 분야는 빠르게 발전하고 있고, 다양한 인물의 표정 정보를 얻는데 딥 러닝 기법이 사용되면서 보다 효율적으로 인물의 감정을 파악할 수 있게 되었다.Today, the field of recognizing a person's facial expressions or emotions is rapidly developing, and as deep learning techniques are used to obtain facial expression information of various characters, it has become possible to more efficiently grasp a person's emotions.

한편, 개나 고양이와 같은 반려 동물의 수는 급증하고 있고, 인간과 반려동물 간 소통이 날이 갈수록 중요해진다. PET TV나 원거리에서 반려동물을 모니터링하는 시스템들은 등장하고 있지만, 동물의 상태를 자동으로 인식하는 즉, 동물과 쉽게 소통하는 시스템은 개발된 바 없다.Meanwhile, the number of companion animals such as dogs and cats is increasing rapidly, and communication between humans and companion animals becomes more important as the days go by. PET TV or systems that monitor companion animals from a distance have appeared, but no system has been developed that automatically recognizes the condition of animals, that is, easily communicates with animals.

종래기술에 따르면, 사람들이 TV를 보며 스트레스를 푸는 것처럼, 도그 TV와 같은 PET TV는 반려 동물에게 동물 관련 영상물을 보여주면서 불안감을 해소시켜주는 기능을 하고 있다. According to the prior art, just as people relieve stress by watching TV, PET TV such as dog TV has a function of relieving anxiety by showing animal-related images to companion animals.

일부 서비스는 주인과 반려동물이 상호 작용하는 기능을 제공하기도 한다. 예를 들어 집에 반려동물이 혼자 있을 때 원격 모니터를 통해 실시간으로 주인 얼굴을 보여주거나 목소리를 들려줌으로써 반려 동물을 안정시킨다.Some services also provide the ability to interact with the owner and pet. For example, when a companion animal is alone at home, it stabilizes the companion animal by showing the owner's face in real time through a remote monitor or listening to a voice.

너울정보라는 회사에서는 펫펄스라는 동물의 심박수 등을 이용한 IoT 인터페이스를 개발하였다. 그러나 동물의 목에 밀착해서 부착해야 하므로 조일 동물에게 불편을 줄 뿐만 아니라 심박수의 특성 상 다양한 동물 상태 인식은 사실상 불가능하다.A company called Null Information has developed an IoT interface called Pet Pulse that uses the heart rate of animals. However, since it must be closely attached to the neck of the animal, it is not only uncomfortable to the animal to be tightened, but also it is virtually impossible to recognize various animal states due to the nature of the heart rate.

또한, 논문으로 동물의 뇌파 (Electroencephalogram; 이하 EEG)정보를 이용한 동물 상태 인식 연구가 이루어진 바 있다. 그러나 심박수보다 EEG는 더 획득하기 어렵고 고가이기 때문에 상용화가 어렵다. In addition, as a thesis, a study on animal state recognition has been conducted using the animal's electroencephalogram (EEG) information. However, it is difficult to commercialize EEG because it is more difficult to obtain and expensive than heart rate.

반려 동물도 사람처럼 내면 상태가 존재하고, 이를 표정, 행동, 소리로 표출한다. 배고픔, 아픔, 기쁨, 슬픔 및 각종 생리현상을 표출하지만, 주인이 감으로 대응하는 것은 한계가 존재한다. 말로써 대화가 되지 않으므로 전문가가 아닌 이상 동물과의 소통이 쉽지 않다. Companion animals, like humans, have an inner state and express this through facial expressions, actions, and sounds. It expresses hunger, pain, joy, sadness, and various physiological phenomena, but there is a limit to how the owner responds with sense. It is difficult to communicate with animals unless you are an expert because you cannot communicate with words.

본 발명은 영상과 소리 즉, AV (Audio-Visual) 신호에 근거하여 반려 동물의 상태를 인식하는 시스템을 제안한다. The present invention proposes a system for recognizing a state of a companion animal based on an image and sound, that is, an audio-visual (AV) signal.

본 발명이 이루고자 하는 기술적 과제는 개나 고양이 같은 반려 동물의 수가 급증하고, 인간과 반려동물 간 소통이 날이 갈수록 중요해짐에 따라, 영상 센서와 마이크 등을 이용하여 대상 동물의 표정, 행동, 소리를 통해 그 동물의 상태를 자동으로 인식하여 그 동물과 소통하는 시스템을 제공하는데 있다. 본 발명은 영상과 소리 즉, AV(Audio-Visual) 신호에 근거하여 반려 동물의 상태를 인식하는 시스템을 제안한다. As the number of companion animals such as dogs and cats increases rapidly, and communication between humans and companion animals becomes more important day by day, the technical problem to be achieved by the present invention is to monitor the expression, behavior, and sound of the target animal using image sensors and microphones. It is to provide a system that automatically recognizes the state of the animal and communicates with the animal. The present invention proposes a system for recognizing the state of a companion animal based on an image and sound, that is, an audio-visual (AV) signal.

본 발명에서 제안하는 영상과 소리를 이용한 동물 상태 인식 방법을 위해서는 학습 과정이 선행되어야 한다. 학습을 위해서는 먼저 데이터 셋이 구성되어야 한다. 데이터 셋 구성 과정은 다음과 같다: 대상 동물과 유사한 동물들, 예를 들면 같은 종의 서로 다른 여러 동물들로부터 충분히 많은 영상 데이터와 소리 데이터를 쌍으로 수집한다. 편의상 동물의 상태는 N개라고 가정하자. 상태 별로 충분한 수의 데이터쌍들이 수집되었다고 가정한다. 각 (영상, 소리) 데이터의 상태를 전문가가 결정하는 단계를 어노테이션(annotation)이라 하며, 수집된 데이터들에 대해 이 작업을 수행한다. 학습용 데이터 셋에서 각 데이터에 규정된 상태를 GT (ground-truth)라고 한다.A learning process must be preceded for the animal state recognition method using images and sounds proposed by the present invention. In order to learn, a data set must first be constructed. The data set construction process is as follows: Sufficiently large amounts of image and sound data are collected in pairs from animals similar to the target animal, for example, from several different animals of the same species. For convenience, assume that the number of animals is N. It is assumed that a sufficient number of data pairs have been collected for each state. The step in which the expert determines the state of each (video, sound) data is called annotation, and this task is performed on the collected data. In the training data set, the state specified for each data is called GT (ground-truth).

대상 동물을 위한 학습용 데이터 셋이 준비되었으면, 다음과 같은 학습 과정을 수행한다. 학습을 위해 SVM같은 전통적인 기계학습법을 사용할 수도 있고, CNN같은 딥러닝을 사용할 수도 있다. CNN을 예로 든다. CNN 기반 학습 과정은 상기 데이터 셋의 데이터들을 CNN에 입력하는 단계, 대응되는 상태값들 즉, GT들을 출력으로 정하는 단계, back propagation 같은 소정의 CNN 학습법을 통해 CNN의 가중치 값들과 각종 parameter들을 학습하는 단계로 구성된다. 학습 과정은 통상 off-line으로 미리 수행될 수 있다. 또한, 영상 데이터 셋 단독으로 상기 과정이 수행되는 것이 가능하다. 또한, 소리 데이터 셋 단독으로 상기 과정이 수행되는 것이 가능하다. 또한 영상, 소리 쌍을 하나의 데이터로 간주하여 상기 과정을 수행하는 것도 가능하다. When the learning data set for the target animal is prepared, the following learning process is performed. For learning, traditional machine learning methods such as SVM can be used, or deep learning such as CNN can be used. Take CNN as an example. The CNN-based learning process includes inputting the data of the data set to the CNN, determining corresponding state values, that is, GTs as output, and learning CNN weight values and various parameters through a predetermined CNN learning method such as back propagation. It consists of steps. The learning process can usually be performed off-line in advance. In addition, it is possible to perform the above process with the image data set alone. In addition, it is possible to perform the above process with the sound data set alone. In addition, it is possible to perform the above process by considering the image and sound pair as one data.

기계학습 기반 분류기를 이용하여 동물의 상태를 실시간으로 인식하는 시스템의 동작은 다음과 같다. 상태를 인식하고자 하는 대상 동물의 영상 데이터 및 소리 데이터를 획득하는 단계, 미리 기계학습된 분류기에 상기 획득된 영상 및 소리 데이터를 입력하는 단계, N가지 상태값을 계산하는 단계, 계산된 상태값 중에서 최선의 상태를 선택하는 단계, 그리고 사용자에게 소리나 영상 형태의 결과값을 전달하는 단계로 구성된다. The operation of the system to recognize the state of an animal in real time using a machine learning-based classifier is as follows. Acquiring image data and sound data of a target animal for which the state is to be recognized, inputting the acquired image and sound data to a pre-machine-learned classifier, calculating N state values, among the calculated state values It consists of selecting the best state and delivering the result value in the form of sound or video to the user.

상기 영상 및 소리를 이용하여 상태를 인식하고자 하는 동물의 상태를 실시간으로 분류하는 단계는 영상 데이터만을 이용하여 해당 동물의 상태를 인식하거나, 또는 소리 데이터만을 이용하여 해당 동물의 상태를 인식하거나, 또는 영상 데이터 및 소리 데이터를 이용하여 해당 동물의 상태를 인식하는 것이 가능하다. The step of classifying the state of the animal for which the state is to be recognized using the image and sound in real time may include recognizing the state of the animal using only image data, or recognizing the state of the animal using only sound data, or It is possible to recognize the state of the animal by using image data and sound data.

상기 동물의 상태는 슬픔, 기쁨을 포함하는 감정 상태 및 배고픔, 배변을 포함하는 생리현상을 포함한다. 영상 정보에는 그에 대응하는 동물의 표정이나 행동을 표현되어 있고, 소리 정보에는 그에 대응하는 동물의 소리가 표현되어 있다.The state of the animal includes an emotional state including sadness and joy, and a physiological phenomenon including hunger and bowel movements. The image information expresses the corresponding animal's expression or behavior, and the sound information expresses the corresponding animal's sound.

또 다른 일 측면에 있어서, 본 발명에서 제안하는 영상과 소리를 이용한 동물 상태 인식 장치는 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 획득하는 센싱부, 획득된 영상 데이터 및 소리 데이터를 입력 받아 상태를 인식하고자 하는 동물의 상태를 실시간으로 분류하는 동물 상태 결정부, 결정된 동물의 상태를 음성이나 영상을 통해 사용자에게 전달하는 상태 전달부 등을 포함한다.In another aspect, the animal state recognition apparatus using images and sounds proposed by the present invention receives a sensing unit that acquires image data and sound data of an animal to recognize the state, and receives the acquired image data and sound data. It includes an animal state determination unit that classifies the state of an animal to be recognized in real time, and a state transmission unit that transmits the determined state of the animal to a user through audio or video.

본 발명은 영상과 소리 즉, AV(Audio-Visual) 신호에 근거하여 반려 동물의 상태를 인식할 수 있다. 본 발명은 스마트폰의 앱으로 개발되어 사용자가 실시간으로 동물과 소통할 수 있게 할 수 있다. 본 발명은 카메라, 마이크, 스피커, 디스플레이, 계산용 프로세서 등으로 구성된 실시간 동물 소통 전용 하드웨어로 구현될 수도 있다. 상기 장치를 이용하여 근거리에서 대상 동물을 촬영하면 실시간으로 사용자는 대상 동물의 상태를 파악할 수 있다.The present invention can recognize the state of a companion animal based on an image and sound, that is, an audio-visual (AV) signal. The present invention is developed as an app for a smartphone so that a user can communicate with an animal in real time. The present invention may be implemented by dedicated hardware for real-time animal communication consisting of a camera, a microphone, a speaker, a display, and a processor for calculation. When a target animal is photographed at a short distance using the device, the user can grasp the state of the target animal in real time.

도 1은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식을 위한 학습용 데이터 셋 구축 방법을 설명하기 위한 흐름도이다.
도 2는 본 발명의 일 실시예에 따른 동물 상태 인식 방법의 개념도이다.
도 3은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식을 위한 학습 방법을 설명하기 위한 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 동물의 상태를 분류하는 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 방법을 설명하기 위한 흐름도이다.
도 6은 본 발명의 또 다른 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 방법을 설명하기 위한 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 장치의 구성을 나타내는 도면이다. 1 is a flowchart illustrating a method of constructing a learning data set for recognizing an animal condition using images and sounds according to an embodiment of the present invention.
2 is a conceptual diagram of a method for recognizing an animal condition according to an embodiment of the present invention.
3 is a flowchart illustrating a learning method for recognizing an animal state using images and sounds according to an embodiment of the present invention.
4 is a view for explaining a process of classifying the state of an animal according to an embodiment of the present invention.
5 is a flowchart illustrating a method of recognizing an animal condition using images and sounds according to an embodiment of the present invention.
6 is a flowchart illustrating a method for recognizing an animal condition using images and sounds according to another embodiment of the present invention.
7 is a diagram illustrating a configuration of an apparatus for recognizing an animal condition using images and sounds according to an embodiment of the present invention.

본 발명은 개나 고양이와 같은 동물의 상태를 분류하고 인식하는 방법을 제안한다. 동물도 행동, 표정, 소리를 통해 자신의 상태를 표현한다. 따라서 타겟 동물군에 대해 충분한 영상 및 소리 데이터를 획득하고 동물 전문가들을 통한 어노테이션(annotation)을 수행한다. 이를 통해 학습용 데이터 셋을 구축하고, 학습용 데이터는 해당 동물군의 상태를 한정된 개수로 분류된다. 학습 단계에서는 SVM(Support Vector Machines)이나 딥러닝 같은 기계학습을 이용하여 주어진 타켓 동물군의 학습용 데이터 셋으로 학습을 수행한다. 추론(Inference) 단계에서는 학습과 동일한 타겟 동물군에 속하는 임의의 동물로부터 실시간으로 영상과 소리를 획득한다. 획득된 영상과 소리는 상기 학습된 분류기로 입력되어 특정한 상태로 실시간 분류된다. 이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다.The present invention proposes a method for classifying and recognizing the state of animals such as dogs and cats. Animals also express their condition through actions, facial expressions, and sounds. Therefore, sufficient image and sound data are obtained for the target animal group, and annotation is performed through animal experts. Through this, a data set for learning is constructed, and the data for learning is classified by a limited number of states of the animal group. In the learning stage, learning is performed with the training data set of a given target animal group using machine learning such as SVM (Support Vector Machines) or deep learning. In the inference step, images and sounds are acquired in real time from random animals belonging to the same target animal group as learning. The acquired images and sounds are input to the learned classifier and classified in real time into a specific state. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식을 위한 학습용 데이터 셋 구축 방법을 설명하기 위한 흐름도이다. 1 is a flowchart illustrating a method of constructing a learning data set for recognizing an animal condition using images and sounds according to an embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식을 위한 학습용 데이터 셋 구축 방법은 타겟 동물군에 대해 동물의 행동, 표정, 소리를 포함하는 복수의 영상 데이터 및 소리 데이터를 수집하는 단계(110), 수집된 영상 데이터 및 소리 데이터 각각에 대하여 해당하는 복수의 타겟 동물군의 상태를 결정하기 위해 어노테이션(annotation)을 수행하는 단계(120) 및 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축하는 단계(130)를 포함한다. The proposed method for constructing a learning data set for animal state recognition using images and sounds is the step of collecting (110) a plurality of image data and sound data including animal behavior, facial expressions, and sounds for a target animal group. The step 120 of performing an annotation to determine the state of a plurality of target animals corresponding to each of the image data and sound data, and the state value specified using the result of the annotation is set to GT (groundtruth) And, it includes a step 130 of building a data set for learning.

단계(110)에서, 타겟 동물군에 대해 동물의 행동, 표정, 소리를 포함하는 복수의 영상 데이터 및 소리 데이터를 수집한다. 예를 들어, 영상 데이터의 센싱은 동물 주변의 고정된 카메라 또는 스마트폰과 같은 사용자가 휴대 가능한 영상획득 장치 등을 통해 가능하다. 소리 데이터의 센싱은 주변의 고정된 카메라에 내장된 마이크, 동물 신체에 부착된 형태의 마이크, 또는 스마트폰과 같은 사용자가 휴대 가능한 장치에 내장된 마이크 등을 통해 가능하다. In step 110, a plurality of image data and sound data including animal behavior, facial expressions, and sounds are collected for the target animal group. For example, the sensing of image data is possible through a fixed camera around an animal or an image acquisition device portable by a user such as a smartphone. Sensing of sound data is possible through a microphone built into a fixed camera around, a microphone attached to an animal's body, or a microphone built into a user-portable device such as a smartphone.

단계(120)에서, 수집된 영상 데이터 및 소리 데이터 각각에 대하여 해당하는 복수의 타겟 동물군의 상태를 결정하기 위해 어노테이션(annotation)을 수행한다. 이후 단계(130)에서, 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축한다. 어노테이션(annotation)을 수행하여 타겟 동물군의 상태를 결정하기 위한 학습용 데이터 셋을 구축하고, 학습용 데이터 셋의 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 특정 종 별로 구성한다. In step 120, an annotation is performed on each of the collected image data and sound data to determine the states of the corresponding plurality of target animal groups. Thereafter, in step 130, a defined state value is set to GT (groundtruth) using the result of performing the annotation, and a data set for learning is constructed. A training data set is constructed to determine the state of the target animal group by performing annotation, the prescribed state value of the training data set is set to GT (groundtruth), and the training data set is configured for each specific species.

도 2는 본 발명의 일 실시예에 따른 동물 상태 인식 방법의 개념도이다. 2 is a conceptual diagram of a method for recognizing an animal condition according to an embodiment of the present invention.

크게는 영상 기반 네트워크와 소리 기반의 네트워크로 나뉜다. 영상 기반 네트워크를 통해 영상 정보(210)를 수집하여 동물의 표정이나 행동 변화를 효율적으로 분석하며(230), 동시에 소리가 존재하는 경우 소리 정보(220)를 수집하고, 사운드 네트워크를 통해 추가로 분석한다(240). 이와 같이 영상 데이터 및 소리 데이터 두 가지 정보를 모두 활용하여 최종적인 동물의 상태를 결정한다(250). It is largely divided into a video-based network and a sound-based network. Image information 210 is collected through an image-based network to efficiently analyze changes in expressions or behaviors of animals (230), and if sound exists at the same time, sound information 220 is collected and further analyzed through a sound network Do (240). In this way, the final state of the animal is determined by using both image data and sound data (250).

도 3은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식을 위한 학습 방법을 설명하기 위한 흐름도이다.3 is a flowchart illustrating a learning method for recognizing an animal state using images and sounds according to an embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식을 위한 학습 방법은 기계학습을 수행하기 위한 학습용 데이터 셋을 입력 받는 단계(310), 입력 받은 학습용 데이터 셋에 대하여 대응하는 GT들을 출력으로 결정하는 단계(320) 및 출력된 GT들에 대한 가중치 값들 및 파라미터들을 학습하는 단계(330)를 포함한다. The proposed learning method for animal state recognition using images and sounds includes the step of receiving a training data set for performing machine learning (310), and determining the GTs corresponding to the received training data set as outputs (320). ) And learning weight values and parameters for the output GTs (330).

앞서 설명된 바와 같이, 학습용 데이터 셋을 구축하기 위해 먼저 대상 동물과 같은 종의 동물들에 대해 충분히 많은 동영상 촬영(소리 포함)을 수행한다. 대상 동물 전문가를 활용하여 각 데이터 속 동물의 상태를 결정하는 어노테이션(annotation)을 수행한다. 획득한 학습용 데이터 셋의 규정된 상태값은 GT(groundtruth)로 설정한다. 동물마다 종마다 특성이 다르므로 학습용 데이터 셋은 특정 종 별로 구성하는 것이 바람직하다. 본 발명의 실시예에 따른 학습용 데이터 셋은 특정 종의 동물들의 영상이나 소리로 구성되며, 반드시 각 데이터는 하나의 상태로 어노테이션 되어 있어야 한다. As described above, in order to construct a data set for learning, a sufficient number of video recordings (including sound) are performed on animals of the same species as the target animal. Annotation to determine the state of animals in each data is performed using the subject animal expert. The prescribed state value of the acquired training data set is set to GT (groundtruth). Since the characteristics of each animal are different for each species, it is desirable to configure the learning data set for each specific species. A data set for learning according to an embodiment of the present invention is composed of images or sounds of animals of a specific species, and each data must be annotated in one state.

단계(310)에서, 기계학습을 수행하기 위한 학습용 데이터 셋을 입력 받는다. 기계학습을 위해서는 충분한 양의 영상 데이터 및 소리 데이터들이 요구된다. 예를 들어, 대표적인 반려동물인 개 종 중 치와와를 가정하자. 또한 대상 동물 즉 치와와의 상태는 N개라고 가정하자. In step 310, a learning data set for performing machine learning is input. A sufficient amount of image data and sound data are required for machine learning. For example, suppose a chihuahua is a typical companion animal, a dog species. Also, suppose that there are N states of the target animal, that is, Chihuahua.

단계(320)에서, 입력 받은 학습용 데이터 셋에 대하여 대응하는 GT들을 출력으로 결정한다. In step 320, the GTs corresponding to the inputted training data set are determined as outputs.

구성된 학습용 데이터 셋을 활용하여 동물 상태 분류를 학습한다. 기계학습을 수행하기 위해 학습용 데이터 셋의 각 입력에 대응하는 GT를 출력으로 결정한다. Using the configured training data set, learn animal state classification. In order to perform machine learning, the GT corresponding to each input of the training data set is determined as an output.

단계(330)에서, 최적화 과정을 거쳐 해당 기계학습 모듈 내 가중치 값들 혹은 파라미터들을 학습한다. In step 330, weight values or parameters in the corresponding machine learning module are learned through an optimization process.

예를 들어, 영상 파트는 전형적인 HOG + SVM 모듈로 구성되어 학습될 수도 있고, RESNET이나 DenseNet같은 CNN으로 구성되어 학습될 수도 있다. 영상 한 장이 아닌 동영상을 활용할 경우 CNN + LSTM 구조가 될 수도 있다. For example, an image part may be learned by being configured with a typical HOG + SVM module, or it may be learned by being configured with a CNN such as RESNET or DenseNet. When using a video rather than a single video, it may have a CNN + LSTM structure.

영상 데이터 및 소리 데이터 각각에서 타겟 동물군의 상태를 결정하기 위한 영상 특징 및 소리 특징을 추출하고, 추출된 영상 특징 및 소리 특징을 이용하여 타겟 동물군의 상태를 분류한다. Image features and sound features for determining the state of the target animal group are extracted from each of the image data and sound data, and the state of the target animal group is classified using the extracted image features and sound features.

이때, 학습용 데이터 셋의 영상 데이터만을 이용하여 기계학습을 수행하거나, 또는 학습용 데이터 셋의 소리 데이터만을 이용하여 기계학습을 수행하거나, 또는 학습용 데이터 셋의 영상 데이터 및 소리 데이터를 이용하여 기계학습을 수행할 수 있다. At this time, machine learning is performed using only the image data of the learning data set, or machine learning is performed using only the sound data of the learning data set, or machine learning is performed using the image data and sound data of the learning data set. can do.

도 4는 본 발명의 일 실시예에 따른 동물의 상태를 분류하는 과정을 설명하기 위한 도면이다. 4 is a view for explaining a process of classifying the state of an animal according to an embodiment of the present invention.

영상 파트(430)는 영상 특징 추출(431)과 분류(432)로 구분된다. 영상 특징 추출(431)은 영상을 입력(410) 받아, HoG, LBP, Haar와 같은 핸드-크래프트 피쳐(hand-crafted feature)를 추출할 수도 있고, CNN같은 뉴럴네트워크를 사용할 수도 있다. The image part 430 is divided into an image feature extraction 431 and a classification 432. The image feature extraction 431 may receive an image as an input 410 and extract a hand-crafted feature such as HoG, LBP, or Haar, or may use a neural network such as CNN.

분류(432)는 입력된 특징들을 SVM, 랜덤 포레스트(random forest)와 같은 기계학습법으로 분류한다. Classification 432 classifies the input features into a machine learning method such as SVM and random forest.

영상 특징 추출(431) 및 분류(432)를 한 번에 통합한 CNN, RNN, LSTM 등을 사용할 수도 있다. A CNN, RNN, LSTM, etc. in which the image feature extraction 431 and classification 432 are integrated at one time may be used.

소리 파트(440)는 소리 특징 추출(441)과 분류(442)로 구분된다. 소리 특징 추출(441)은 소리를 입력(420) 받아, HoG, LBP, Haar와 같은 핸드-크래프트 피쳐(hand-craft feature)를 추출할 수도 있고, CNN같은 뉴럴네트워크를 사용할 수도 있다. 소리 파트(440)도 입력만 달라질 뿐 영상 파트(430)와 방식은 유사하다. 다만, 소리의 1차원적 특성을 고려하여 CNN 대신 DNN, RNN, LSTM, GRU 등이 사용될 수 있다.The sound part 440 is divided into sound feature extraction 441 and classification 442. The sound feature extraction 441 may receive a sound input 420 and extract hand-craft features such as HoG, LBP, and Haar, or may use a neural network such as CNN. The sound part 440 has a similar method to the image part 430, but only the input is different. However, in consideration of the one-dimensional characteristics of sound, DNN, RNN, LSTM, GRU, etc. may be used instead of CNN.

한편, 영상 파트(430) 기계학습 결과와 소리 파트(440) 기계학습 결과에 따른 영상 스코어(450) 및 소리 스코어(460)를 퓨전(fusion)(470)하여 최종 분류 결과(480)를 낸다. 퓨전(470)은 보통 영상 스코어(450) 및 소리 스코어(460)들을 가중치 평균한다.Meanwhile, a final classification result 480 is produced by fusion 470 of the image score 450 and the sound score 460 according to the machine learning result of the image part 430 and the machine learning result of the sound part 440. Fusion 470 usually weights averages the video scores 450 and the sound scores 460.

본 발명의 실시예에 따르면, 영상만 사용하여 동물의 상태를 분류할 수 있다. 영상만 사용하는 경우, 도 4에서 학습된 영상 파트만 이용하여, 즉 카메라 출력 중 영상만 이용하여 동물의 상태를 감지할 수 있다. 예를 들어, 카메라는 CCTV처럼 동물 주변에 고정된 카메라가 될 수도 있으며, 스마트폰과 같이 움직이는 카메라가 될 수도 있다. According to an embodiment of the present invention, it is possible to classify the state of an animal using only an image. When only an image is used, the state of the animal may be detected using only the image part learned in FIG. 4, that is, only the image during camera output. For example, the camera may be a camera fixed around an animal like a CCTV, or a moving camera like a smartphone.

본 발명의 또 다른 실시예에 따르면, 소리만 사용하여 동물의 상태를 분류할 수 있다. 소리만 사용하는 경우, 도 4에서 학습된 소리 파트만 이용하여, 즉 마이크 출력만을 이용하여 동물의 상태를 감지할 수도 있다. 예를 들어, 마이크는 동물에 부착이 될 수도 있고, 주변에 고정되어 있을 수도 있으며, 스마트폰에 내장된 마이크가 될 수도 있다. According to another embodiment of the present invention, it is possible to classify the state of an animal using only sound. When only sound is used, the state of the animal may be detected using only the sound part learned in FIG. 4, that is, only the microphone output. For example, the microphone may be attached to an animal, may be fixed around it, or may be a microphone built into a smartphone.

본 발명의 또 다른 실시예에 따르면, 영상 및 소리 모두를 사용하여 동물의 상태를 분류할 수 있다. 영상 및 소리 모두를 사용하는 경우가 가장 일반적이며 영상만 사용하는 경우와 소리만 사용하는 경우를 합쳐놓은 것과 같다. According to another embodiment of the present invention, it is possible to classify the state of an animal using both images and sounds. The most common case of using both video and sound is the same as the combined case of using only video and using only sound.

도 5는 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 방법을 설명하기 위한 흐름도이다. 5 is a flowchart illustrating a method of recognizing an animal condition using images and sounds according to an embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식 방법은 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집하는 단계(510), 미리 기계학습된 분류기에 수집된 영상 데이터 및 소리 데이터를 입력하는 단계(520), 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산하는 단계(530), 계산된 상태값들 중 가장 높은 상태값에 해당하는 상태를 동물의 상태로 결정하는 단계(540) 및 결정된 동물의 상태를 사용자에게 전달하는 단계(550)를 포함한다. The proposed animal state recognition method using images and sounds includes the steps of collecting image data and sound data of an animal that wants to recognize the state (510), and inputting the image data and sound data collected into a machine-learned classifier ( 520), calculating a plurality of state values for the input image data and sound data (530), determining a state corresponding to the highest state value among the calculated state values as the state of the animal (540), and It includes the step 550 of transmitting the determined animal state to the user.

단계(510)에서, 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집한다. 예를 들어, 영상 데이터의 센싱은 동물 주변의 고정된 카메라 또는 스마트폰과 같은 사용자가 휴대 가능한 영상획득 장치 등을 통해 가능하다. 소리 데이터의 센싱은 주변의 고정된 카메라에 내장된 마이크, 동물 신체에 부착된 형태의 마이크, 또는 스마트폰과 같은 사용자가 휴대 가능한 장치에 내장된 마이크 등을 통해 가능하다. In step 510, image data and sound data of an animal for which the state is to be recognized are collected. For example, the sensing of image data is possible through a fixed camera around an animal or an image acquisition device portable by a user such as a smartphone. Sensing of sound data is possible through a microphone built into a fixed camera around, a microphone attached to an animal's body, or a microphone built into a user-portable device such as a smartphone.

단계(520)에서, 미리 기계학습된 분류기에 수집된 영상 데이터 및 소리 데이터를 입력한다. In step 520, the collected image data and sound data are input to a pre-machine-learned classifier.

단계(530)에서, 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산한다. In step 530, a plurality of state values for the input image data and sound data are calculated.

단계(540)에서, 계산된 상태값들 중 가장 높은 상태값에 해당하는 상태를 동물의 상태로 결정한다. 미리 기계학습된 분류기의 학습용 데이터 셋을 활용하여 영상 데이터 및 소리 데이터에 대한 영상 스코어 및 소리 스코어를 가중치 평균하고, 가장 스코어가 높은 상태값을 선택한다. 선택된 상태값에 해당하는 상태를 동물의 상태로 결정한다. In step 540, a state corresponding to the highest state value among the calculated state values is determined as the state of the animal. By using the training data set of the pre-machine-learned classifier, the video and sound scores of the video data and the sound data are weighted and averaged, and the state value with the highest score is selected. The state corresponding to the selected state value is determined as the state of the animal.

이때, 상태를 인식하고자 하는 동물의 영상 데이터만을 이용하여 해당 동물의 상태를 인식하거나, 또는 소리 데이터만을 이용하여 해당 동물의 상태를 인식하거나, 또는 영상 데이터 및 소리 데이터를 이용하여 해당 동물의 상태를 인식한다. At this time, the state of the animal is recognized using only image data of the animal for which the state is to be recognized, or the state of the animal is recognized using only sound data, or the state of the animal is recognized using image data and sound data. Recognize.

단계(550)에서 결정된 동물의 상태를 사용자에게 전달한다. 동물의 상태를 음성 또는 영상을 통해 사용자에게 전달할 수 있다. 동물의 상태는 슬픔, 기쁨을 포함하는 감정 상태 및 배고픔, 배변을 포함하는 생리현상을 포함하고, 영상 데이터에는 동물의 상태에 대응하는 동물의 표정, 행동이 표현되어 있고, 소리 데이터에는 동물의 상태에 대응하는 동물의 소리가 표현되어 있다. The state of the animal determined in step 550 is transmitted to the user. The state of the animal can be communicated to the user through audio or video. The state of the animal includes emotional states including sadness and joy, and physiological phenomena including hunger and bowel movements, and the image data expresses the animal's expression and behavior corresponding to the state of the animal, and the sound data includes the state of the animal. Animal sounds corresponding to are expressed.

도 6은 본 발명의 또 다른 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 방법을 설명하기 위한 흐름도이다.6 is a flowchart illustrating a method for recognizing an animal condition using images and sounds according to another embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식 방법은 타겟 동물군에 대해 동물의 행동, 표정, 소리를 포함하는 복수의 영상 데이터 및 소리 데이터를 수집하여 영상 데이터 및 소리 데이터 각각에 대해 해당하는 복수의 타겟 동물군의 상태를 결정하기 위해 어노테이션(annotation)을 수행하고, 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축하는 단계(610), 기계학습을 수행하기 위한 학습용 데이터 셋을 입력 받아 대응하는 GT들을 출력으로 결정하고, 출력된 GT들에 대한 가중치 값들 및 파라미터들을 학습하는 단계(620) 및 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집하여 미리 기계학습된 분류기에 입력하고, 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산하여 가장 높은 상태값에 해당하는 상태를 동물의 상태로 결정하고, 결정된 동물의 상태를 사용자에게 전달하는 단계(630)를 포함한다. The proposed animal state recognition method using images and sounds collects a plurality of image data and sound data including animal behavior, facial expressions, and sounds for a target animal group, and a plurality of targets corresponding to each of the image data and sound data. Performing an annotation to determine the state of the fauna, setting a prescribed state value to GT (groundtruth) using the result of the annotation execution, and constructing a data set for learning (610), performing machine learning In step 620 of learning the weight values and parameters for the output GTs and determining the corresponding GTs as output by receiving the data set for learning to be input, and collecting image data and sound data of an animal to recognize the state It inputs into a pre-machine-learned classifier, calculates a plurality of state values for the input image data and sound data, determines the state corresponding to the highest state value as the state of the animal, and delivers the determined state of the animal to the user. Step 630 is included.

단계(610)에서, 타겟 동물군에 대해 동물의 행동, 표정, 소리를 포함하는 복수의 영상 데이터 및 소리 데이터를 수집하여 영상 데이터 및 소리 데이터 각각에 대해 해당하는 복수의 타겟 동물군의 상태를 결정하기 위해 어노테이션(annotation)을 수행하고, 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축한다. In step 610, by collecting a plurality of image data and sound data including animal behavior, facial expression, and sound for the target animal group, the state of the plurality of target animal groups corresponding to the image data and sound data is determined. To do this, annotation is performed, and the specified state value is set to GT (groundtruth) using the result of the annotation execution, and a data set for learning is constructed.

예를 들어, 영상 데이터의 센싱은 동물 주변의 고정된 카메라 또는 스마트폰과 같은 사용자가 휴대 가능한 영상획득 장치 등을 통해 가능하다. 소리 데이터의 센싱은 주변의 고정된 카메라에 내장된 마이크, 동물 신체에 부착된 형태의 마이크, 또는 스마트폰과 같은 사용자가 휴대 가능한 장치에 내장된 마이크 등을 통해 가능하다. For example, the sensing of image data is possible through a fixed camera around an animal or an image acquisition device portable by a user such as a smartphone. Sensing of sound data is possible through a microphone built into a fixed camera around, a microphone attached to an animal's body, or a microphone built into a user-portable device such as a smartphone.

이후, 어노테이션 수행 결과를 이용하여 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 구축한다. 어노테이션(annotation)을 수행하여 타겟 동물군의 상태를 결정하기 위한 학습용 데이터 셋을 구축하고, 학습용 데이터 셋의 규정된 상태값을 GT(groundtruth)로 설정하고, 학습용 데이터 셋을 특정 종 별로 구성한다. Thereafter, the specified state value is set to GT (groundtruth) using the result of the annotation execution, and a training data set is constructed. A training data set is constructed to determine the state of the target animal group by performing annotation, the prescribed state value of the training data set is set to GT (groundtruth), and the training data set is configured for each specific species.

단계(620)에서 기계학습을 수행하기 위한 학습용 데이터 셋을 입력 받아 대응하는 GT들을 출력으로 결정하고, 출력된 GT들에 대한 가중치 값들 및 파라미터들을 학습한다. In step 620, a training data set for performing machine learning is received, corresponding GTs are determined as outputs, and weight values and parameters for the output GTs are learned.

기계학습을 위해서는 충분한 양의 영상 데이터 및 소리 데이터들이 요구된다. 예를 들어, 대표적인 반려동물인 개 종 중 치와와를 가정하자. 또한 동물의 상태는 N개라고 가정하자. 구성된 학습용 데이터 셋을 활용하여 동물 상태 분류를 학습한다. 기계학습을 수행하기 위해 학습용 데이터 셋을 입력 받아 대응하는 GT들을 출력으로 결정한다. A sufficient amount of image data and sound data are required for machine learning. For example, suppose a chihuahua is a typical companion animal, a dog species. Also, assume that the state of animals is N. Using the configured training data set, learn animal state classification. In order to perform machine learning, a training data set is input and the corresponding GTs are determined as outputs.

기계학습은 예를 들어, 영상 파트의 경우 전형적인 HOG + SVM을 학습할 수도 있고, RESNET이나 DenseNet같은 CNN을 학습할 수도 있다. 영상 한 장이 아닌 동영상을 활용할 경우 CNN + LSTM 구조가 될 수도 있다. In machine learning, for example, in the case of an image part, it is possible to learn typical HOG + SVM or CNN such as RESNET or DenseNet. When using a video rather than a single video, it may have a CNN + LSTM structure.

단계(630)에서 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집하여 미리 기계학습된 분류기에 입력하고, 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산하여 가장 높은 상태값에 해당하는 상태를 동물의 상태로 결정하고, 결정된 동물의 상태를 사용자에게 전달한다. In step 630, the image data and sound data of the animal for which the state is to be recognized are collected and input to a pre-machine-learned classifier, and a plurality of state values for the input image data and sound data are calculated to obtain the highest state value. The corresponding state is determined as the state of the animal, and the determined state of the animal is transmitted to the user.

미리 기계학습된 분류기의 학습용 데이터 셋을 활용하여 영상 데이터 및 소리 데이터에 대한 영상 스코어 및 소리 스코어를 가중치 평균하고, 가장 스코어가 높은 상태값을 선택한다. 선택된 상태값에 해당하는 상태를 동물의 상태로 결정한다. By using the training data set of the pre-machine-learned classifier, the video and sound scores of the video data and the sound data are weighted and averaged, and the state value with the highest score is selected. The state corresponding to the selected state value is determined as the state of the animal.

동물의 상태는 슬픔, 기쁨을 포함하는 감정 상태 및 배고픔, 배변을 포함하는 생리현상을 포함하고, 영상 데이터에는 동물의 상태에 대응하는 동물의 표정, 행동이 표현되어 있고, 소리 데이터에는 동물의 상태에 대응하는 동물의 소리가 표현되어 있다. The state of the animal includes emotional states including sadness and joy, and physiological phenomena including hunger and bowel movements, and the image data expresses the animal's expression and behavior corresponding to the state of the animal, and the sound data includes the state of the animal. Animal sounds corresponding to are expressed.

도 7은 본 발명의 일 실시예에 따른 영상과 소리를 이용한 동물 상태 인식 장치의 구성을 나타내는 도면이다. 7 is a diagram illustrating a configuration of an apparatus for recognizing an animal condition using images and sounds according to an embodiment of the present invention.

제안하는 영상과 소리를 이용한 동물 상태 인식 장치는 센싱부(710), 동물 상태 결정부(720) 및 상태 전달부(730)를 포함한다. The proposed animal condition recognition apparatus using an image and sound includes a sensing unit 710, an animal condition determining unit 720, and a state transmitting unit 730.

센싱부(710)는 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집한다. 예를 들어, 영상 데이터의 센싱은 동물 주변의 고정된 카메라 또는 스마트폰과 같은 사용자가 휴대 가능한 영상획득 장치 등을 통해 가능하다. 소리 데이터의 센싱은 주변의 고정된 카메라에 내장된 마이크, 동물 신체에 부착된 형태의 마이크, 또는 스마트폰과 같은 사용자가 휴대 가능한 장치에 내장된 마이크 등을 통해 가능하다. The sensing unit 710 collects image data and sound data of an animal for which a state is to be recognized. For example, the sensing of image data is possible through a fixed camera around an animal or an image acquisition device portable by a user such as a smartphone. Sensing of sound data is possible through a microphone built into a fixed camera around, a microphone attached to an animal's body, or a microphone built into a user-portable device such as a smartphone.

상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 센싱부를 통해 수집하고, 센싱부와 연동된 CPU 혹은 GPU를 통해 동물의 상태를 실시간으로 분류하기 위한 연산을 수행한다. 센싱부에서 수집된 영상 데이터 및 소리 데이터의 전달은 무선이나 유선 모두 가능하다. 이와 같이 동물의 상태를 실시간으로 분류하는 것은 스마트폰 앱으로도 개발 가능하다.Image data and sound data of an animal for which the state is to be recognized are collected through a sensing unit, and an operation is performed to classify the animal state in real time through a CPU or GPU linked to the sensing unit. The image data and sound data collected by the sensing unit can be transmitted wirelessly or wired. Categorizing the animal's condition in real time like this can also be developed with a smartphone app.

동물 상태 결정부(720)는 상태를 인식하고자 하는 동물의 영상 데이터 및 소리 데이터를 수집하여 미리 기계학습된 분류기에 입력하고, 입력된 영상 데이터 및 소리 데이터에 대한 복수의 상태값들을 계산하여 가장 높은 값에 해당하는 상태값을 선택하여 동물의 상태로 결정한다. The animal state determination unit 720 collects image data and sound data of an animal for which the state is to be recognized, inputs it to a pre-machine-learned classifier, and calculates a plurality of state values for the input image data and sound data, The state value corresponding to the value is selected and determined as the state of the animal.

미리 기계학습된 분류기의 학습용 데이터 셋을 활용하여 영상 데이터 및 소리 데이터에 대한 영상 스코어 및 소리 스코어를 가중치 평균하고, 가장 스코어가 높은 상태값을 선택한다. By using the training data set of the pre-machine-learned classifier, the video and sound scores of the video data and the sound data are weighted and averaged, and the state value with the highest score is selected.

학습용 데이터 셋의 영상 파트 기계학습 결과와 소리 파트 기계학습 결과에 따른 영상 스코어 및 소리 스코어를 가중치 평균한다. 예를 들어, 퓨젼(Fusion) 결과 N개의 최종 스코어(score)들이 나온다. 가장 스코어가 높은 상태를 그 동물의 상태로 결정한다.The video and sound scores according to the machine learning results of the video parts and the machine learning results of the sound parts of the training data set are weighted and averaged. For example, the result of Fusion results in N final scores. The state with the highest score is determined as the state of the animal.

상태 전달부(730)는 결정된 동물의 상태를 사용자에게 전달한다. 슬픔, 기쁨을 포함하는 감정 상태 및 배고픔, 배변을 포함하는 생리현상을 포함하는 동물의 상태를 음성 또는 영상을 통해 사용자에게 전달한다. The state transmission unit 730 transmits the determined animal state to the user. The emotional state including sadness and joy, and the state of the animal including physiological phenomena including hunger and bowel movements are transmitted to the user through audio or video.

동물의 상태는 슬픔, 기쁨을 포함하는 감정 상태 및 배고픔, 배변을 포함하는 생리현상을 포함하고, 영상 데이터에는 동물의 상태에 대응하는 동물의 표정, 행동이 표현되어 있고, 소리 데이터에는 동물의 상태에 대응하는 동물의 소리가 표현되어 있다. The state of the animal includes emotional states including sadness and joy, and physiological phenomena including hunger and bowel movements, and the image data expresses the animal's expression and behavior corresponding to the animal's state, and the sound data includes the animal's state. Animal sounds corresponding to

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the devices and components described in the embodiments include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It can be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to behave as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. Can be embodyed. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of the program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited embodiments and drawings, various modifications and variations are possible from the above description by those of ordinary skill in the art. For example, the described techniques are performed in a different order from the described method, and/or components such as a system, structure, device, circuit, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and claims and equivalents fall within the scope of the claims to be described later.

Claims

Collecting a plurality of image data and sound data including animal behavior, facial expressions, and sounds for the target animal group;
Performing an annotation on each of the collected image data and sound data to determine a state of a corresponding plurality of target fauna; And
The step of setting the specified state value as GT (groundtruth) using the result of annotation execution and constructing a training data set
A method of constructing a learning data set for animal state recognition using images and sounds, including.

Receiving a learning data set for performing machine learning;
Determining, as outputs, corresponding GTs for the received training data set; And
Learning weight values and parameters for the output GTs
Learning method for animal state recognition using images and sounds, including.

The method of claim 2,
CNN machine learning including HOG and SVM, RESNET, and DenseNet is performed for image data using the training data set, or CNN and LSTM structure machine learning is performed for video data.
Machine learning is performed using only the image data of the learning data set, or machine learning is performed using only the sound data of the learning data set, or machine learning is performed using the image data and sound data of the learning data set.
Learning method for animal state recognition using images and sounds.

Collecting image data and sound data of an animal whose state is to be recognized;
Inputting collected image data and sound data to a pre-machine-learned classifier;
Calculating a plurality of state values for the input image data and sound data;
Determining a state corresponding to the highest state value among the calculated state values as the state of the animal; And
Delivering the determined animal condition to the user
Animal state recognition method using images and sounds comprising a.

The method of claim 4,
The step of determining the state corresponding to the highest state value among the calculated state values as the state of the animal,
By using the training data set of the pre-machine-learned classifier, the video and sound scores of the video and sound data are weighted and averaged, and the state value with the highest score is selected.
Recognizing the state of the animal using only the image data of the animal for which the state is to be recognized, or recognizing the state of the animal using only sound data, or recognizing the state of the animal using image data and sound data.
Animal condition recognition method using images and sound.

The method of claim 4,
The state of the animal includes emotional states including sadness and joy, and physiological phenomena including hunger and bowel movements, and the image data expresses the animal's expression and behavior corresponding to the animal's state, and the sound data includes the animal's state. Animal sounds corresponding to
Animal condition recognition method using images and sound.

Annotation to determine the state of the corresponding plurality of target fauna for each of the image data and sound data by collecting a plurality of image data and sound data including animal behavior, facial expression, and sound for the target fauna Performing, and setting a prescribed state value to GT (groundtruth) using the result of performing the annotation, and constructing a training data set;
Receiving a training data set for performing machine learning, determining corresponding GTs as outputs, and learning weight values and parameters for the output GTs; And
The image data and sound data of the animal for which the state is to be recognized are collected and input to a machine-learned classifier, and a plurality of state values for the input image data and sound data are calculated to determine the state corresponding to the highest state value. Determining the state of the animal and transmitting the determined animal state to the user
Animal state recognition method using an image and sound comprising a.

A sensing unit that collects image data and sound data of an animal to recognize a state;
The image data and sound data of the animal for which the state is to be recognized are collected and input into a machine-learned classifier, and a plurality of state values for the input image data and sound data are calculated and the state value corresponding to the highest value is selected. An animal state determination unit that determines the state of the animal; And
Status transmission unit that delivers the determined animal status to the user
Animal state recognition device using an image and sound comprising a.

The method of claim 8,
The animal condition determination unit,
By using the training data set of the pre-machine-learned classifier, the weighted average of the video and sound scores for the video and sound data, and the state value with the highest score is selected
Recognizing the state of the animal using only the image data of the animal for which the state is to be recognized, or recognizing the state of the animal using only sound data, or recognizing the state of the animal using image data and sound data.
Animal condition recognition device using images and sound.

The method of claim 8,
The state transmission unit,
Delivering the emotional state including sadness and joy and the state of the animal including physiological phenomena including hunger and bowel movements to the user through audio or video
Animal condition recognition device using images and sound.