KR102349851B1

KR102349851B1 - System and method for providing multi-object recognition service using camera for pet

Info

Publication number: KR102349851B1
Application number: KR1020210039403A
Authority: KR
Inventors: 황도연
Original assignee: 주식회사 디랩스
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-01-11

Abstract

Provided is a system for providing a multi-object recognition service for companion animals using a camera, which comprises: a camera installed or provided to face a direction of a space where one or more companion animals live to photograph the companion animals, and outputting image data obtained by photographing the companion animals; a user terminal for outputting result data obtained by recognizing the companion animals when the image data collected by the camera is output; and a recognition service providing server including a registration unit for mapping and registering the user terminal and camera from the user terminal, a recognition unit for identifying and tracking multiple objects corresponding to the companion animals included in the image data received from the camera, and a transmission unit for transmitting the identified and tracked result data with the image data. Therefore, demands of a plurality of companion animals can be accurately recognized and immediately resolved.

Description

System and method for providing multi-object recognition service for companion animals using a camera

본 발명은 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 시스템 및 방법에 관한 것으로, 카메라를 이용하여 다중객체를 식별 및 추적할 수 있는 플랫폼을 제공한다.The present invention relates to a system and method for providing a multi-object recognition service for companion animals using a camera, and provides a platform for identifying and tracking multi-objects using a camera.

산업의 발달로 인해 사회구조가 변화되면서 고령화와 핵가족화, 1인 가구의 급증으로 반려동물을 키우는 가구가 증가하고 있다. 산업연구원(KIET)의 산업 경제분석 리포트에 의하면 국내 반려동물 관련 시장규모는 꾸준히 증가하고 있으며, 사료나 용품, 의료 뿐만 아니라 반려동물을 가족으로 대하는 가구가 증가함에 따라 최근에는 센서와 통신기술을 바탕으로 한 반려동물 전용 모니터링 시스템이나 웨어러블 제품들이 국내·외 회사를 통해 상용화되고 있다고 보고된다. 주인이 장시간 외출을 하게 될 경우, 홀로 남겨진 반려동물의 케어는 복잡한 문제로 다가올 수 있다. 이는 건강 문제뿐만 아니라 반려동물이 일으키는 행동으로 인해 발생하는 크고 작은 경제적 손실을 일컫는다. 전기코드나 배터리를 물어뜯어 발생하는 감전·폭발사고, 전지레인지 터치로 인한 화재사고 등이 그 예이다. 소방청의 보고에 따르면 반려동물에 의한 화재 발생 건수는 점진적으로 증가하고 있으며, 재산피해액도 늘어나고 있다.As the social structure is changed due to the development of industry, the number of households raising companion animals is increasing due to the aging population, the number of nuclear families, and the rapid increase of single-person households. According to the industrial economic analysis report of the Korea Institute for Industrial Economics and Trade (KIET), the domestic companion animal-related market is steadily increasing. It is reported that monitoring systems and wearable products exclusively for companion animals are being commercialized through domestic and foreign companies. If the owner goes out for a long time, caring for a pet left alone can be a complicated problem. This refers not only to health problems, but also to large and small economic losses caused by pet behavior. Examples are electric shocks or explosions caused by biting electric cords or batteries, and fire accidents caused by the touch of a battery range. According to the report of the Fire Department, the number of fires caused by companion animals is gradually increasing, and the amount of property damage is also increasing.

이때, 반려동물을 식별 및 추적하고 이상행동을 분석하여 이상행동을 멈출 수 있도록 IoT 기기를 제어하는 플랫폼이 연구 및 개발되었는데, 이와 관련하여, 선행기술인 한국등록특허 제10-1961669호(2019년03월26일 공고), 한국공개특허 제2020-0055821호(2020년05월22일 공개) 및 한국공개특허 제2015-0101760호(2015년09월04일 공개)에는, 분리불안 행동의 유형을 다양화하고, 추종동작이 가능하여 반려견의 분리불안을 완화할 수 있도록 반려동물의 분리불안 행동패턴 및 추종공간을 저장하고, 고정식 카메라는 추종공간 및 추종공간 내 에서 반려동물의 행동을 촬영하여 서버로 전송하며, 서버에 의해서 반려동물의 행동이 분리불안 행동으로 판단된 경우, 서버로부터 추종명령을 수신하여 반려동물을 추종하는 구성과, 카메라에 의해 촬영된 반려 동물의 영상에 기반한 영상 데이터 세트의 각 영상 데이터를 멀티 스트림의 입력 값로 하는 CNN에 대해 CNN 출력 값을 생성하는 CNN 출력 값을 생성하고, CNN 출력 값을 입력 값으로 하는 LSTM에 대해 LSTM 출력 값을 생성하며, LSTM 출력 값에 기반하여 반려 동물의 위험 행동을 검출하는 구성과, 집에 혼자 남겨진 반려동물이 비정상 상태인 때 이를 감지하여 원격지의 반려동물 주인에게 즉시 알려줄 수 있도록 반려동물 소리 신호로부터 반려동물이 비정상 상태인지 여부를 판단하여 주인에게 알려주는 구성이 각각 개시되어 있다.At this time, a platform for controlling IoT devices to identify and track companion animals and analyze abnormal behaviors to stop abnormal behaviors was researched and developed. 26th), Korean Patent Publication No. 2020-0055821 (published on May 22, 2020) and Korean Patent Application Publication No. 2015-0101760 (published on September 04, 2015), various types of separation anxiety behavior The companion animal’s separation anxiety behavior pattern and tracking space are stored so that the companion animal can relieve separation anxiety by enabling tracking and tracking, and the fixed camera records the companion animal’s behavior in the following space and following space and sends it to the server. When the companion animal's behavior is determined to be separation anxiety by the server, it receives a follow command from the server to follow the companion animal, and each of the image data set based on the companion animal image captured by the camera Generates a CNN output value that generates a CNN output value for a CNN that uses image data as an input value of a multi-stream, generates an LSTM output value for an LSTM that uses a CNN output value as an input value, and rejects based on the LSTM output value A configuration that detects dangerous behavior of animals, and detects when a companion animal left alone at home is in an abnormal state, and determines whether the companion animal is in an abnormal state from the companion animal sound signal so that it can immediately notify the owner of the companion animal in a remote location. Each configuration is disclosed to inform the .

다만, 상술한 구성은 한 마리 또는 두 마리 정도의 가정 내에서 사용가능한 구성일 뿐, 최근 주인의 부재로 반려동물 돌봄 서비스, 예를 들어 반려동물 유치원, 애견카페 또는 애견운동장과 같이 복수의 반려동물을 한 번에 관리해야 하는 업종에서는 사용가능한 구성이 아니다. 또, 가정에 설치된다고 할지라도 최근 스마트폰으로 항상 연결되어 끊임없이 상사나 직장으로부터 업무지시나 독촉을 받아 5분 대기조여야 하는 주인이, 이상행동을 하는 것을 알려준다고 하여 바로 집으로 달려갈 수 있는 상황도 아니기 때문에, 집으로 가지 못하는 주인은 자신의 반려동물이 애처롭게 우는 상황을 그저 스마트폰으로 보기만 해야 하는 안타까움만 주는 기술만 개시하고 있을 뿐이다. 이에, 복수의 반려동물을 다중객체로 지정 및 인식할 수 있으면서도 이상행동이 발생할 때 이를 인공지능으로 분석하여 적절한 조치를 해줄 수 있는 제어 플랫폼의 연구 및 개발이 요구된다.However, the above-described configuration is only a configuration that can be used in a household of about one or two animals, and in the absence of a recent owner, a companion animal care service, for example, a plurality of companion animals such as a companion animal kindergarten, a dog cafe, or a dog playground It is not a usable configuration in industries that need to manage all at once. Also, even if it is installed at home, it is always connected to a smartphone recently and the owner, who has to wait for 5 minutes to receive work orders or reminders from the boss or workplace, can run to the house immediately by telling him that he is doing something abnormal. Also, the owner who can't go home is just launching a technology that makes it sad that he has to watch his pet's pitiful crying with his smartphone. Accordingly, research and development of a control platform that can designate and recognize a plurality of companion animals as multi-objects, and analyze them with artificial intelligence when abnormal behavior occurs, and take appropriate actions is required.

본 발명의 일 실시예는, 적어도 하나의 반려동물을 다중객체로 지정하여 식별 및 추적하고, 반려동물의 몸체 데이터로부터 행동 데이터를, 안면 데이터로부터 표정 데이터를, 영상 데이터로부터 소리 데이터를 추출하여 반려동물의 상태를 파악하고 각 반려동물이 현재 원하는 것이 무엇인지를 즉시 공급해줄 수 있도록 적어도 하나의 IoT 기기를 제어함으로써 반려동물 돌봄 서비스를 제공하는 업체나 다견 또는 다묘 가정에서도 복수의 반려동물의 니즈를 정확하게 파악하고 즉시 해결해줄 수 있도록 하는, 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 방법을 제공할 수 있다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.An embodiment of the present invention identifies and tracks at least one companion animal by designating it as a multi-object, extracting behavioral data from the companion animal's body data, facial expression data from facial data, and sound data from image data for companion animal By controlling at least one IoT device to understand the condition of the animal and provide each companion animal with what they currently want, companies that provide care for companion animals or multi-dog or multi-cat households can meet the needs of multiple companion animals. It is possible to provide a method of providing a multi-object recognition service for companion animals using a camera that can accurately identify and solve problems immediately. However, the technical task to be achieved by the present embodiment is not limited to the technical task as described above, and other technical tasks may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예는, 적어도 하나의 반려동물을 촬영하도록 반려동물이 생활하는 공간 방향으로 대향되도록 설치 또는 구비되고, 적어도 하나의 반려동물을 촬영한 영상 데이터를 출력하는 카메라, 카메라로부터 수집된 영상 데이터를 출력할 때 적어도 하나의 반려동물을 인지한 결과 데이터를 함께 출력하는 사용자 단말 및 사용자 단말로부터 사용자 단말 및 카메라를 매핑하여 등록하는 등록부, 카메라로부터 수신된 영상 데이터 내 포함된 적어도 하나의 반려동물에 대응하는 다중객체를 식별 및 추적하는 인지부, 식별 및 추적한 결과 데이터를 영상 데이터와 함께 전송하는 전송부를 포함하는 인지 서비스 제공 서버를 포함한다.As a technical means for achieving the above-described technical problem, an embodiment of the present invention is installed or provided to face at least one companion animal in the direction of the space in which the companion animal lives so as to photograph at least one companion animal, and photographing at least one companion animal A camera that outputs one image data, a user terminal that recognizes at least one companion animal when outputting image data collected from the camera, and a user terminal that outputs data as a result of recognizing at least one companion animal, and a registration unit that maps and registers the user terminal and the camera from the user terminal, the camera A cognitive service providing server including a recognition unit for identifying and tracking multiple objects corresponding to at least one companion animal included in the image data received from the .

본 발명의 다른 실시예는, 사용자 단말로부터 상기 사용자 단말 및 카메라를 매핑하여 등록하는 단계, 카메라로부터 수신된 영상 데이터 내 포함된 적어도 하나의 반려동물에 대응하는 다중객체를 식별 및 추적하는 단계 및 식별 및 추적한 결과 데이터를 영상 데이터와 함께 전송하는 단계를 포함한다.Another embodiment of the present invention includes the steps of mapping and registering the user terminal and the camera from the user terminal, identifying and tracking multiple objects corresponding to at least one companion animal included in image data received from the camera, and identification and transmitting the tracking result data together with the image data.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 적어도 하나의 반려동물을 다중객체로 지정하여 식별 및 추적하고, 반려동물의 몸체 데이터로부터 행동 데이터를, 안면 데이터로부터 표정 데이터를, 영상 데이터로부터 소리 데이터를 추출하여 반려동물의 상태를 파악하고 각 반려동물이 현재 원하는 것이 무엇인지를 즉시 공급해줄 수 있도록 적어도 하나의 IoT 기기를 제어함으로써 반려동물 돌봄 서비스를 제공하는 업체나 다견 또는 다묘 가정에서도 복수의 반려동물의 니즈를 정확하게 파악하고 즉시 해결해줄 수 있도록 한다.According to any one of the above-described problem solving means of the present invention, at least one companion animal is designated as a multi-object, identified and tracked, behavior data from body data of the companion animal, expression data from facial data, and expression data from image data By extracting sound data to understand the status of companion animals and controlling at least one IoT device so that each companion animal can immediately supply what they want, companies that provide care for companion animals or multiple or multi-cat households Accurately identify the needs of pets and provide immediate solutions.

도 1은 본 발명의 일 실시예에 따른 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 시스템을 설명하기 위한 도면이다.
도 2는 도 1의 시스템에 포함된 인지 서비스 제공 서버를 설명하기 위한 블록 구성도이다.
도 3 및 도 4는 본 발명의 일 실시예에 따른 카메라를 이용한 반려동물 다중객체 인지 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 방법을 설명하기 위한 동작 흐름도이다.1 is a view for explaining a system for providing a multi-object recognition service for companion animals using a camera according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a cognitive service providing server included in the system of FIG. 1 .
3 and 4 are diagrams for explaining an embodiment in which a companion animal multi-object recognition service using a camera according to an embodiment of the present invention is implemented.
5 is an operation flowchart illustrating a method for providing a multi-object recognition service for companion animals using a camera according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated, and one or more other features However, it is to be understood that the existence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded in advance.

명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본 발명의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본 발명의 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다. The terms "about", "substantially", etc. to the extent used throughout the specification are used in or close to the numerical value when manufacturing and material tolerances inherent in the stated meaning are presented, and are intended to enhance the understanding of the present invention. To help, precise or absolute figures are used to prevent unfair use by unconscionable infringers of the stated disclosure. As used throughout the specification of the present invention, the term "step of (to)" or "step of" does not mean "step for".

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체 지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware. Meanwhile, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or to reproduce one or more processors. Thus, as an example, '~' denotes components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and '~ units' may be combined into a smaller number of components and '~ units' or further separated into additional components and '~ units'. In addition, components and '~ units' may be implemented to play one or more CPUs in a device or secure multimedia card.

본 명세서에 있어서 단말, 장치 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말, 장치 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말, 장치 또는 디바이스에서 수행될 수도 있다. Some of the operations or functions described as being performed by the terminal, apparatus, or device in the present specification may be performed instead of by a server connected to the terminal, apparatus, or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal, apparatus, or device connected to the server.

본 명세서에서 있어서, 단말과 매핑(Mapping) 또는 매칭(Matching)으로 기술된 동작이나 기능 중 일부는, 단말의 식별 정보(Identifying Data)인 단말기의 고유번호나 개인의 식별정보를 매핑 또는 매칭한다는 의미로 해석될 수 있다.In this specification, some of the operations or functions described as mapping or matching with the terminal means mapping or matching the terminal's unique number or personal identification information, which is the identification data of the terminal. can be interpreted as

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 시스템을 설명하기 위한 도면이다. 도 1을 참조하면, 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 시스템(1)은, 적어도 하나의 사용자 단말(100), 인지 서비스 제공 서버(300), 적어도 하나의 카메라(400), 적어도 하나의 IoT 기기(500)를 포함할 수 있다. 다만, 이러한 도 1의 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 시스템(1)은, 본 발명의 일 실시예에 불과하므로, 도 1을 통하여 본 발명이 한정 해석되는 것은 아니다.1 is a diagram for explaining a system for providing a multi-object recognition service for companion animals using a camera according to an embodiment of the present invention. Referring to FIG. 1 , a system 1 for providing a multi-object recognition service for companion animals using a camera includes at least one user terminal 100 , a cognitive service providing server 300 , at least one camera 400 , and at least one The IoT device 500 may be included. However, since the companion animal multi-object recognition service providing system 1 using the camera of FIG. 1 is only an embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 1 .

이때, 도 1의 각 구성요소들은 일반적으로 네트워크(Network, 200)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 적어도 하나의 사용자 단말(100)은 네트워크(200)를 통하여 인지 서비스 제공 서버(300)와 연결될 수 있다. 그리고, 인지 서비스 제공 서버(300)는, 네트워크(200)를 통하여 적어도 하나의 사용자 단말(100), 적어도 하나의 카메라(400), 적어도 하나의 IoT 기기(500)와 연결될 수 있다. 또한, 적어도 하나의 카메라(400)은, 네트워크(200)를 통하여 인지 서비스 제공 서버(300)와 연결될 수 있다. 그리고, 적어도 하나의 IoT 기기(500)는, 네트워크(200)를 통하여 적어도 하나의 사용자 단말(100), 인지 서비스 제공 서버(300) 및 적어도 하나의 카메라(400)와 연결될 수 있다. At this time, each component of FIG. 1 is generally connected through a network (Network, 200). For example, as shown in FIG. 1 , at least one user terminal 100 may be connected to the cognitive service providing server 300 through the network 200 . In addition, the cognitive service providing server 300 may be connected to at least one user terminal 100 , at least one camera 400 , and at least one IoT device 500 through the network 200 . Also, the at least one camera 400 may be connected to the cognitive service providing server 300 through the network 200 . In addition, at least one IoT device 500 may be connected to at least one user terminal 100 , a cognitive service providing server 300 , and at least one camera 400 through the network 200 .

여기서, 네트워크는, 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷(WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), 5GPP(5th Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), RF(Radio Frequency), 블루투스(Bluetooth) 네트워크, NFC(Near-Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Here, the network refers to a connection structure in which information exchange is possible between each node, such as a plurality of terminals and servers, and an example of such a network includes a local area network (LAN), a wide area network (WAN: Wide Area Network), the Internet (WWW: World Wide Web), wired and wireless data communication networks, telephone networks, wired and wireless television networks, and the like. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), 5th Generation Partnership Project (5GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi (Wi-Fi) , Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), RF (Radio Frequency), Bluetooth (Bluetooth) network, NFC ( Near-Field Communication) networks, satellite broadcast networks, analog broadcast networks, Digital Multimedia Broadcasting (DMB) networks, and the like are included, but are not limited thereto.

하기에서, 적어도 하나의 라는 용어는 단수 및 복수를 포함하는 용어로 정의되고, 적어도 하나의 라는 용어가 존재하지 않더라도 각 구성요소가 단수 또는 복수로 존재할 수 있고, 단수 또는 복수를 의미할 수 있음은 자명하다 할 것이다. 또한, 각 구성요소가 단수 또는 복수로 구비되는 것은, 실시예에 따라 변경가능하다 할 것이다.In the following, the term at least one is defined as a term including the singular and the plural, and even if the at least one term does not exist, each element may exist in the singular or plural, and may mean the singular or plural. it will be self-evident In addition, that each component is provided in singular or plural may be changed according to embodiments.

적어도 하나의 사용자 단말(100)은, 카메라를 이용한 반려동물 다중객체 인지 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 반려동물을 등록하고 카메라(400)를 구입한 경우 카메라(400)와 자신의 반려동물의 정보를 등록하는 단말일 수 있다. 또한, 사용자 단말(100)은 반려동물이 복수인 경우 다중객체를 식별 및 추적한 결과를 영상 데이터 상에 출력하는 단말일 수 있다. 그리고, 사용자 단말(100)은 이상행동이 수집된 경우 인지 서비스 제공 서버(300)로부터 이상행동에 대한 정보를 수신하거나 알람을 수신하고 적어도 하나의 IoT 기기(500)의 제어명령을 승인하는 사용자의 단말일 수 있다.At least one user terminal 100, when a companion animal is registered using a web page, an app page, a program or an application related to the companion animal multi-object recognition service using the camera and the camera 400 is purchased, the camera 400 and It may be a terminal that registers information of its own companion animal. In addition, the user terminal 100 may be a terminal that outputs a result of identifying and tracking multiple objects on image data when there are a plurality of companion animals. And, when the abnormal behavior is collected, the user terminal 100 receives information on the abnormal behavior from the cognitive service providing server 300 or receives an alarm and approves the control command of at least one IoT device 500 . It may be a terminal.

여기서, 적어도 하나의 사용자 단말(100)은, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 사용자 단말(100)은, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 사용자 단말(100)은, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one user terminal 100 may be implemented as a computer that can access a remote server or terminal through a network. Here, the computer may include, for example, navigation, a laptop equipped with a web browser, a desktop, and a laptop. In this case, the at least one user terminal 100 may be implemented as a terminal capable of accessing a remote server or terminal through a network. At least one user terminal 100 is, for example, as a wireless communication device that guarantees portability and mobility, navigation, PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) ) terminal, a smart phone, a smart pad, a tablet PC, etc. may include all kinds of handheld-based wireless communication devices.

인지 서비스 제공 서버(300)는, 카메라를 이용한 반려동물 다중객체 인지 서비스 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 제공하는 서버일 수 있다. 그리고, 인지 서비스 제공 서버(300)는, 다중객체를 식별하기 위한 적어도 하나의 인공지능 알고리즘을 내장한 서버일 수 있다. 또한, 인지 서비스 제공 서버(300)는, 카메라(400)를 통하여 수집된 영상 데이터 내에서 반려동물을 식별하고, 반려동물이 복수인 경우 복수의 반려동물을 다중객체로 지정하여 식별 및 추적하는 인지과정을 진행하는 서버일 수 있다. 또, 인지 서비스 제공 서버(300)는, 반려동물의 안면, 몸체를 구분하고, 안면 데이터로부터 표정 데이터를, 몸체 데이터로부터 행동 데이터를, 영상 데이터로부터 소리 데이터를 추출한 후 반려동물의 상태를 분석한 후 이에 대응하는 제어를 IoT 기기(500)로 전달하는 서버일 수 있다.The cognitive service providing server 300 may be a server that provides a companion animal multi-object cognitive service web page, an app page, a program, or an application using a camera. In addition, the cognitive service providing server 300 may be a server in which at least one artificial intelligence algorithm for identifying multiple objects is embedded. In addition, the cognitive service providing server 300 identifies a companion animal within the image data collected through the camera 400, and when there are a plurality of companion animals, the recognition service identifies and tracks the plurality of companion animals by designating them as multi-objects. It may be a server running the process. In addition, the cognitive service providing server 300 separates the face and body of the companion animal, extracts expression data from facial data, behavior data from body data, and sound data from image data, and then analyzes the state of the companion animal. Then, it may be a server that transmits the corresponding control to the IoT device 500 .

여기서, 인지 서비스 제공 서버(300)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다.Here, the cognitive service providing server 300 may be implemented as a computer capable of accessing a remote server or terminal through a network. Here, the computer may include, for example, navigation, a laptop equipped with a web browser, a desktop, and a laptop.

적어도 하나의 카메라(400)은, 카메라를 이용한 반려동물 다중객체 인지 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 반려동물을 촬영하고 이를 인지 서비스 제공 서버(300)로 전송하는 장치일 수 있다.The at least one camera 400 may be a device for photographing a companion animal using a web page, an app page, a program or an application related to the companion animal multi-object recognition service using the camera and transmitting it to the cognitive service providing server 300 have.

여기서, 적어도 하나의 카메라(400)은, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 카메라(400)은, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 카메라(400)은, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one camera 400 may be implemented as a computer that can connect to a remote server or terminal through a network. Here, the computer may include, for example, navigation, a laptop equipped with a web browser, a desktop, and a laptop. In this case, the at least one camera 400 may be implemented as a terminal capable of accessing a remote server or terminal through a network. The at least one camera 400 is, for example, a wireless communication device that ensures portability and mobility, such as navigation, Personal Communication System (PCS), Global System for Mobile communications (GSM), Personal Digital Cellular (PDC), and PHS. (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) It may include all types of handheld-based wireless communication devices such as terminals, smartphones, smartpads, and tablet PCs.

적어도 하나의 IoT 기기(500)는, 카메라를 이용한 반려동물 다중객체 인지 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 인지 서비스 제공 서버(300)로부터 받은 제어에 따라 구동되는 장치일 수 있다. 이때, IoT 기기(500)는, 급식기, 반려동물 돌봄 로봇, 조명, TV, 스피커 등일 수 있으나 이에 한정되지는 않는다.The at least one IoT device 500 may be a device driven according to control received from the cognitive service providing server 300 using a web page, an app page, a program, or an application related to a companion animal multi-object recognition service using a camera. . In this case, the IoT device 500 may be, but is not limited to, a food feeder, a companion animal care robot, a lighting, a TV, a speaker, and the like.

여기서, 적어도 하나의 IoT 기기(500)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 IoT 기기(500)는, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 IoT 기기(500)는, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one IoT device 500 may be implemented as a computer that can connect to a remote server or terminal through a network. Here, the computer may include, for example, navigation, a laptop equipped with a web browser, a desktop, and a laptop. In this case, the at least one IoT device 500 may be implemented as a terminal capable of accessing a remote server or terminal through a network. At least one IoT device 500 is, for example, a wireless communication device that guarantees portability and mobility, including navigation, Personal Communication System (PCS), Global System for Mobile communications (GSM), Personal Digital Cellular (PDC), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) ) terminal, a smart phone, a smart pad, a tablet PC, etc. may include all kinds of handheld-based wireless communication devices.

도 2는 도 1의 시스템에 포함된 인지 서비스 제공 서버를 설명하기 위한 블록 구성도이고, 도 3은 본 발명의 일 실시예에 따른 카메라를 이용한 반려동물 다중객체 인지 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.2 is a block diagram illustrating a cognitive service providing server included in the system of FIG. 1, and FIG. 3 is an embodiment in which a companion animal multi-object recognition service using a camera according to an embodiment of the present invention is implemented. It is a drawing for explanation.

도 2를 참조하면, 인지 서비스 제공 서버(300)는, 등록부(310), 인지부(320), 전송부(330), 태깅부(340), IoT 제어부(350)를 포함할 수 있다.Referring to FIG. 2 , the cognitive service providing server 300 may include a registration unit 310 , a recognition unit 320 , a transmission unit 330 , a tagging unit 340 , and an IoT control unit 350 .

본 발명의 일 실시예에 따른 인지 서비스 제공 서버(300)나 연동되어 동작하는 다른 서버(미도시)가 적어도 하나의 사용자 단말(100), 적어도 하나의 카메라(400) 및 적어도 하나의 IoT 기기(500)로 카메라를 이용한 반려동물 다중객체 인지 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 전송하는 경우, 적어도 하나의 사용자 단말(100), 적어도 하나의 카메라(400) 및 적어도 하나의 IoT 기기(500)는, 카메라를 이용한 반려동물 다중객체 인지 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 설치하거나 열 수 있다. 또한, 웹 브라우저에서 실행되는 스크립트를 이용하여 서비스 프로그램이 적어도 하나의 사용자 단말(100), 적어도 하나의 카메라(400) 및 적어도 하나의 IoT 기기(500)에서 구동될 수도 있다. 여기서, 웹 브라우저는 웹(WWW: World Wide Web) 서비스를 이용할 수 있게 하는 프로그램으로 HTML(Hyper Text Mark-up Language)로 서술된 하이퍼 텍스트를 받아서 보여주는 프로그램을 의미하며, 예를 들어 넷스케이프(Netscape), 익스플로러(Explorer), 크롬(Chrome) 등을 포함한다. 또한, 애플리케이션은 단말 상의 응용 프로그램(Application)을 의미하며, 예를 들어, 모바일 단말(스마트폰)에서 실행되는 앱(App)을 포함한다.At least one user terminal 100, at least one camera 400, and at least one IoT device (not shown) operating in conjunction with the cognitive service providing server 300 according to an embodiment of the present invention When transmitting a companion animal multi-object recognition service application, program, app page, web page, etc. using a camera to 500), at least one user terminal 100, at least one camera 400, and at least one IoT device ( 500) may install or open a companion animal multi-object recognition service application, program, app page, web page, etc. using a camera. In addition, the service program may be driven in at least one user terminal 100 , at least one camera 400 , and at least one IoT device 500 using a script executed in a web browser. Here, the web browser is a program that enables the use of a web (WWW: World Wide Web) service, and refers to a program that receives and displays hypertext written in HTML (Hyper Text Mark-up Language), for example, Netscape. , Explorer, Chrome, and the like. In addition, the application means an application on the terminal, for example, includes an app (App) executed in a mobile terminal (smartphone).

도 2를 참조하면, 등록부(310)는, 사용자 단말(100)로부터 사용자 단말(100) 및 카메라(400)를 매핑하여 등록할 수 있다. 카메라(400)는, 적어도 하나의 반려동물을 촬영하도록 반려동물이 생활하는 공간 방향으로 대향되도록 설치 또는 구비되고, 적어도 하나의 반려동물을 촬영한 영상 데이터를 출력할 수 있다. 사용자 단말(100)은, 카메라(400)로부터 수집된 영상 데이터를 출력할 때 적어도 하나의 반려동물을 인지한 결과 데이터를 함께 출력할 수 있다.Referring to FIG. 2 , the registration unit 310 may register the mapping of the user terminal 100 and the camera 400 from the user terminal 100 . The camera 400 may be installed or provided to face the direction of a space in which the companion animal lives so as to photograph at least one companion animal, and may output image data obtained by photographing the at least one companion animal. When outputting the image data collected from the camera 400 , the user terminal 100 may output data as a result of recognizing at least one companion animal together.

영상기반의 생체인식을 위해서는 우선 생체정보 이미지를 획득해야 한다. 이때 가장 중요한 것은 얼굴 방향이 정면일 때 이미지를 획득하는 것이다. 얼굴 방향이 정면일 때 생체정보를 온전하게 수집할 수 있고, 최대한 같은 형태로 이미지를 수집해야 개체 인식률을 높일 수 있기 때문이다. 반려견은 사람과 달리 원하는 때에 정면을 바라봐 주지 않기 때문에 생체정보 이미지 획득 확률을 높이기 위해서는 초당 30프레임 이상의 실시간 영상에서 반려견이 정면을 바라보고 있는지 아닌지를 판단하는 기술이 필요하다. 또한, 얼굴 방향이 정면일 때 그 이미지에서 홍채인식이나 비문 인식에 사용될 눈과 코 영역 분할을 위한 위치 정보를 제공해야 한다. 또, 반려견의 경우 사람과 달리 털이 많아 개체마다 얼굴방향과 관련 없는 외형적 변화가 크고, 히트맵(Heatmap)과 같은 정밀한 기하학적 정보를 활용하는 데에도 제약이 있다. 따라서 본 발명의 일 실시예에서는 반려견 얼굴 방향 추정에 적합한 정보를 추출하는 방법을 기반으로 반려견 얼굴 정면 여부를 판단하여 얼굴 방향이 정면일 때 생체인식 이미지 분할을 위한 눈과 코의 위치 정보를 제공할 수 있도록 한다.For image-based biometric recognition, first, biometric images must be acquired. At this time, the most important thing is to acquire an image when the face direction is the front. This is because biometric information can be completely collected when the face is facing the front, and the object recognition rate can be increased only when images are collected in the same shape as possible. Unlike humans, dogs do not look straight ahead when they want, so in order to increase the probability of obtaining biometric images, it is necessary to use a technology to determine whether the dog is facing the front in real-time images of 30 frames per second or more. In addition, when the face direction is the front, it is necessary to provide location information for segmentation of the eye and nose regions to be used for iris recognition or inscription recognition in the image. In addition, unlike humans, dogs have a lot of hair, so each individual has a large external change that is not related to the direction of the face, and there are restrictions in using precise geometric information such as heatmaps. Therefore, in one embodiment of the present invention, based on a method of extracting information suitable for estimating the direction of the dog's face, it is determined whether the dog's face is frontal, and when the face direction is the front, it is possible to provide eye and nose position information for biometric image segmentation. make it possible

이를 위하여, 이미지에서 딥러닝 알고리즘을 통해 반려견의 눈과 코를 검출하고, 검출된 눈과 코의 상대적 위치와 크기 등으로부터 얼굴 방향 추정을 위한 5가지 정보를 추출하여 기계학습 분류기를 통해 정면인지 아닌지를 판단한다. 얼굴 방향이 정면일 경우, 생체인식에 사용할 눈과 코 이미지 분할을 위해 딥러닝을 통해 검출된 눈과 코의 위치정보를 제공한다. To this end, the dog's eyes and nose are detected from the image through a deep learning algorithm, and 5 pieces of information for estimating the face direction are extracted from the relative positions and sizes of the detected eyes and nose, and whether it is a frontal view or not through a machine learning classifier. to judge If the face direction is the front, the position information of the eyes and nose detected through deep learning is provided to segment the eye and nose images to be used for biometric recognition.

반려동물의 얼굴 정면판별 방법은 우선, 딥러닝 알고리즘을 통해 입력되는 반려견 이미지에서 눈과 코를 검출하고 각각의 좌표를 출력한다. 눈과 코의 위치 좌표를 기준으로 얼굴 정면판별에 사용할 정보를 추출한다. 추출된 정보를 기계학습 분류기에 입력하여 반려견이 정면을 보고 있는지 아닌지를 판별한다. 얼굴 방향이 정면인 경우, 생체인식에 사용할 눈과 코 영역을 이미지에서 분할할 수 있도록 검출된 눈과 코의 위치 정보를 제공한다. In the face-to-face identification method of companion animals, first, the eyes and nose are detected from the dog image input through a deep learning algorithm, and the respective coordinates are output. Based on the coordinates of the position of the eyes and nose, information to be used for face frontal identification is extracted. By inputting the extracted information into the machine learning classifier, it is determined whether the dog is facing the front or not. When the face direction is the front, the detected eye and nose position information is provided so that the eye and nose regions to be used for biometric recognition can be segmented from the image.

국내에서 가장 많이 양육되는 반려견 대표 품종인 말티즈, 푸들, 시츄, 요코셔테리어, 포메라니안 이미지를 수집하고, 정면과 비정면으로 분류할 수 있다. 두 눈과 코 중 하나라도 보이지 않는 이미지는 눈과 코 검출 후 개수를 확인하여 쉽게 정면이 아닌 것으로 판별할 수 있다. 따라서 두 개의 눈과 코가 모두 보이는 이미지만 수집할 수 있다. 학습을 위한 데이터는 tzutalin GitHub에서 배포하고 있는 공개소스 프로그램인 LabelImg를 사용하여 반려견 눈과 코에 대해 라벨링을 할 수 있다. 라벨링이 완료되면 각 이미지에 대해 클래스 번호, 경계상자(Bounding Box)의 정규화 된 좌푯값(x, y, w, h)이 텍스트 파일로 저장된다.It is possible to collect images of Maltese, Poodle, Shih Tzu, Yokoshire Terrier, and Pomeranian, which are the representative dog breeds most bred in Korea, and classify them into frontal and non-frontal. An image in which at least one of the two eyes and the nose is not visible can be easily determined as not the front by checking the number after detecting the eyes and nose. Therefore, only images in which both eyes and nose are visible can be collected. The data for training can be labeled for the dog's eyes and nose using LabelImg, an open-source program distributed by tzutalin GitHub. When labeling is complete, the class number and normalized coordinate values (x, y, w, h) of the bounding box for each image are saved as a text file.

<딥러닝 기반 반려동물 눈 및 코 검출><Deep learning based pet eye and nose detection>

얼굴 특징들의 상대적인 위치와 크기로 얼굴 방향을 판단하는 기하학적 정보를 활용하는 방법에서는 각 특징의 위치를 얼마나 정확하게 찾는지가 매우 중요하다. 또한, 비협조적인 반려견으로부터 생체인식에 적합한 생체 정보 이미지 수집 확률을 높이기 위해서는 초당 30프레임 이상의 실시간 영상처리가 가능해야 한다. 따라서 본 발명의 일 실시예에서는 반려견 눈과 코 검출을 위해 실시간 처리가 가능하며 정확도도 우수한 YOLOv3과 YOLOv4 알고리즘을 이용할 수 있다. YOLO(You Only Look Once)는 하나의 합성곱 신경망(Convolutional Network)이 이미지를 한번 보는 것으로 객체의 위치검출(Localization)과 분류(Classification)를 동시에 수행하여 실시간 처리에 최적화된 딥러닝 알고리즘이다. 이 알고리즘은 계속 수정되고 발전해오면서 여러 형태로 존재하며, YOLOv3와 YOLOv4는 가장 최근에 공개된 모델이다. In a method that uses geometric information to determine the direction of a face based on the relative positions and sizes of facial features, how accurately each feature is located is very important. In addition, in order to increase the probability of collecting biometric images suitable for biometric recognition from uncooperative dogs, real-time image processing of at least 30 frames per second should be possible. Therefore, in one embodiment of the present invention, real-time processing is possible for dog eyes and nose detection, and YOLOv3 and YOLOv4 algorithms with excellent accuracy can be used. YOLO (You Only Look Once) is a deep learning algorithm optimized for real-time processing by simultaneously performing localization and classification of an object by one convolutional network viewing an image once. This algorithm exists in several forms as it continues to be modified and developed, and YOLOv3 and YOLOv4 are the most recently published models.

객체 검출 알고리즘의 구조는 Backbone, Neck, Head(Dense Prediction) 세 부분으로 구성된다. Backbone은 특징지도(Feature Map) 추출을 위해 활용되는 이미 학습된 네트워크이고, Neck은 Backbone의 다른 단계에서 다른 특징지도를 추출하기 위해 Backbone과 Head 사이에 존재하는 계층(Layer)이며, Head는 실제로 객체검출을 담당하는 부분이다. YOLOv3는 darknet53을 Backbone으로 적용하고 FPN(Feature Pyramid Network)을 Neck으로 사용하며, Head는 경계상자에 객체가 있는지를 판단하기 위한 점수와 그 객체가 어떤 클래스(Class)인지 분류하기 위한 클래스별 확률을 동시에 예측하는 구조로 되어있다. YOLOv4는 최신 딥러닝 기술을 접목하여 YOLOv3를 개선한 모델이며, CSPdarknet53을 Backbone에 적용하고, SPP(Spatial Pyramid Pooling)과 PAN(Path Aggregation Network)을 Neck으로 사용하며, Head는 YOLOv3의 구조를 가진다.The structure of the object detection algorithm consists of three parts: Backbone, Neck, and Head (Dense Prediction). Backbone is an already learned network used for feature map extraction, Neck is a layer that exists between Backbone and Head to extract different feature maps at different stages of Backbone, and Head is actually an object. This is the part responsible for detection. YOLOv3 applies darknet53 as a backbone and uses FPN (Feature Pyramid Network) as a neck, and the head calculates the score for determining whether there is an object in the bounding box and the class-specific probability for classifying the object. It is structured to predict at the same time. YOLOv4 is an improved model of YOLOv3 by grafting the latest deep learning technology. CSPdarknet53 is applied to the backbone, SPP (Spatial Pyramid Pooling) and PAN (Path Aggregation Network) are used as the neck, and the head has the structure of YOLOv3.

<반려동물 얼굴 정면판별을 위한 정보 추출><Information extraction for frontal identification of pets>

반려동물의 얼굴 움직임의 방향을 Yaw, Pitch, Roll로 나타낸다. Yaw, Pitch, Roll은 각각 Z축 기준 회전, X축 기준 회전, Y축 기준 회전을 의미한다. 반려견 얼굴에서 Roll 방향의 틀어짐은 생체정보 이미지의 변형에 크게 영향을 주지 않고, 두 눈을 연결하는 선의 틀어진 각도를 구해 쉽게 보정할 수 있다. 따라서 얼굴 정면 여부는 Yaw와 Pitch 방향 틀어짐으로만 판단한다. 사람의 얼굴 방향 추정에 사용된 기하학적 방법을 응용하여 Yaw와 Pitch 방향 틀어짐에 대한 정보를 제공하는 다섯 가지 요소를 추출하고 A~E로 표시한다. 두 눈과 코의 중심점을 연결하는 삼각형의 3개의 내각 값은 Yaw와 Pitch 방향에 대한 정보를 제공한다. 두 눈의 중심점을 연결하는 선의 길이와 그 선에서 수직으로 코와 연결된 선의 길이의 비율은 Pitch 방향에 대한 정보를 제공한다. 두 눈 중심점을 연결하는 선과 그 선에서 수직으로 코와 연결된 선이 만나는 지점에서 왼쪽 눈까지의 길이와 오른쪽 눈까지의 길이의 비율은 Yaw 방향의 정보를 제공한다. 검출된 왼쪽 눈과 오른쪽 눈의 경계상자 면적 비율은 Yaw 방향의 정보를 제공한다. 눈과 코의 경계상자에서 코 면적 대비 왼쪽 눈의 면적과 오른쪽 눈의 면적 비율 또한 Yaw 방향의 정보를 제공한다.The direction of the pet's face movement is indicated by Yaw, Pitch, and Roll. Yaw, Pitch, and Roll refer to rotation based on the Z axis, rotation based on the X axis, and rotation based on the Y axis, respectively. The misalignment in the roll direction on the dog's face does not significantly affect the deformation of the biometric image, and it can be easily corrected by finding the misaligned angle of the line connecting the two eyes. Therefore, whether the face is frontal is judged only by the deviation of the yaw and pitch directions. By applying the geometric method used for estimating the direction of a person's face, five factors that provide information on the yaw and pitch direction misalignment are extracted and marked as A~E. The values of the three interior angles of the triangle connecting the center points of the eyes and the nose provide information on the yaw and pitch directions. The ratio of the length of the line connecting the center points of the two eyes to the length of the line connecting the nose vertically from the line provides information on the pitch direction. The ratio of the length from the point where the line connecting the center points of the two eyes to the line connecting the nose to the vertical line meets the left eye to the right eye provides information in the yaw direction. The detected bounding box area ratio of the left eye and the right eye provides information in the yaw direction. The ratio of the area of the left eye and the area of the right eye to the area of the nose in the bounding box of the eye and nose also provides information in the yaw direction.

<기계학습 기반 반려견 얼굴 정면 여부 판별><Machine learning-based dog face detection>

추출한 5가지 반려견 얼굴 방향 정보를 활용하여 얼굴 방향이 정면인지를 판별하기 위해 딥러닝 방법인 MLP(Multi-Layer Perceptron)과 RF(Random Forest), SVM(Support Vector Machine)을 제안한 방법에 적용해볼 수 있다. MLP 분류기는 입력층(Input Layer), 은닉층(Hidden Layer), 출력층(Output Layer)으로 구성되는 딥러닝 알고리즘이다. MLP 분류기 성능에 영향을 주는 요소에는 은닉층의 수, 각 층의 노드(Node) 수, 활성화 함수(Activation), 최적화 알고리즘(Optimizer), L2 일반화 파라미터(Alpha), 학습속도(Learning Rate)가 있다. RF 분류기는 나무 형태로 데이터를 분류하는 의사결정 나무(Decision Tree) 분류기를 같은 형태로 여러 개 사용하고 투표를 통해 최종적으로 분류하는 알고리즘이다. RF 분류기 성능에 영향을 주는 요소에는 의사결정 나무의 수(Estimator), 최대깊이(Max Depth), 하위 노드 분리를 위한 최소 샘플 수(Min Samples Split), 노드가 되기 위한 최소 샘플 수(Min Samples Leaf) 등이 있다. SVM 분류기는 근접한 샘플들 사이의 거리를 최대화하는 결정 경계(Decision Boundary)를 찾아 데이터를 분류하는 알고리즘이다. SVM 분류기 성능에 영향을 주는 요소에는 커널(Kernel), 일반화 파라미터(C), 감마(Gamma)가 있다. 분류기별 성능 최적화를 위해 조정할 하이퍼 파라미터(Hyper-parameter)는 성능에 영향을 많이 주는 변수들로 선정할 수 있다. 가장 성능이 좋은 조합을 찾기 위해 Grid Search 알고리즘을 사용할 수 있고, 성능 검증에는 최근 보편적으로 사용되는 K-겹 교차검증 방법을 사용할 수 있다. 이렇게 반려동물의 등록 및 식별이 완료되었으면 그 다음은 인지 단계이다.It can be applied to the proposed methods of deep learning methods such as MLP (Multi-Layer Perceptron), RF (Random Forest), and SVM (Support Vector Machine) to determine whether the face direction is the front by using the extracted 5 dog face direction information. have. The MLP classifier is a deep learning algorithm composed of an input layer, a hidden layer, and an output layer. Factors that affect the performance of the MLP classifier include the number of hidden layers, the number of nodes in each layer, activation function, optimization algorithm, L2 generalization parameter (Alpha), and learning rate. The RF classifier is an algorithm that uses multiple decision tree classifiers that classify data in a tree form and finally classifies them through voting. The factors that affect the RF classifier performance include the number of decision trees (Estimator), the maximum depth (Max Depth), the minimum number of samples for sub-node splitting (Min Samples Split), and the minimum number of samples to become a node (Min Samples Leaf). ), etc. The SVM classifier is an algorithm that classifies data by finding a decision boundary that maximizes the distance between adjacent samples. Factors that affect SVM classifier performance include kernel, generalization parameter (C), and gamma. Hyper-parameters to be adjusted for performance optimization for each classifier can be selected as variables that have a large impact on performance. The grid search algorithm can be used to find the combination with the best performance, and the K-fold cross-validation method, which is commonly used recently, can be used for performance verification. When the registration and identification of the companion animal is completed, the next stage is the recognition stage.

인지부(320)는, 카메라(400)로부터 수신된 영상 데이터 내 포함된 적어도 하나의 반려동물에 대응하는 다중객체를 식별 및 추적할 수 있다. 인지부(320)는, 적어도 하나의 반려동물의 안면 데이터, 몸체 데이터 및 무늬 데이터에 기반하여 적어도 하나의 반려동물을 식별 및 추적할 수 있다. 무늬 데이터는 색상 또는 패턴을 포함하는 데이터일 수 있다. 안면 데이터는 표정 데이터를 포함할 수 있다. 여기서, 적어도 하나의 반려동물의 안면 데이터로부터 에지점 추출로 윤곽선을 그리기 위하여 안면을 이루는 적어도 하나의 구성요소 간 에지 점을 연결하고, 각 구성요소 간 연결선 거리에 기반하여 다중객체를 식별할 수 있다. 또, 인지부(320)는, 적어도 하나의 반려동물의 비문 인식(Nose Print Recognition)을 통하여 상기 다중객체를 식별할 수 있다.The recognizer 320 may identify and track multiple objects corresponding to at least one companion animal included in the image data received from the camera 400 . The recognition unit 320 may identify and track at least one companion animal based on the facial data, body data, and pattern data of the at least one companion animal. The pattern data may be data including a color or a pattern. The facial data may include facial expression data. Here, in order to draw an outline by extracting edge points from the facial data of at least one companion animal, edge points between at least one component constituting the face are connected, and multiple objects can be identified based on the distance between the connecting lines between each component. . Also, the recognition unit 320 may identify the multi-object through nose print recognition of at least one companion animal.

이때, 다중객체인 복수의 반려동물을 식별하고 추적하기 위하여 본 발명의 일 실시예에서는 두 가지 방법을 이용할 수 있다. 첫 번째는 LiDAR와 카메라(400)의 이종 센서 데이터를 이용하여 다중객체를 식별하고 추적하는 방법이고, 두 번째는 반려동물의 행동이 급작스러운 턴(Turn) 또는 행동을 보이는 경우가 많고 카메라(400)가 고정된 경우 카메라에 잡히기에는 너무 작은 특성, 즉 반려동물이라는 객체는 그 크기가 작고 움직임이 불규칙한 특성이 있으므로 객체 인식 및 자동 추적을 위하여 병렬화된 CPU를 이용하는 방법을 이용할 수 있다.In this case, in one embodiment of the present invention, two methods may be used to identify and track a plurality of companion animals that are multi-objects. The first is a method of identifying and tracking multi-objects using LiDAR and heterogeneous sensor data from the camera 400, and the second is a method in which the companion animal often shows a sudden turn or action, and the camera 400 ) is fixed, the object called companion animal has a characteristic that is too small to be caught by the camera, that is, has a small size and irregular movement, so a method using a parallelized CPU for object recognition and automatic tracking can be used.

<LiDAR와 카메라><LiDAR and Camera>

본 발명의 일 실시예에 따른 LiDAR는 예를 들어, 4채널 LiDAR 시스템으로 905nm 대역의 펄스레이저를 조사하여 APD를 통해서 광 신호를 획득하고 이를 거리 값으로 변환하는 LiDAR일 수 있다. 스캐닝 광학계의 경우 평면거울 회전 방식의 광학계를 적용하며 이를 통해 최대 수평시야각 120도를 확보한다. 수직 해상도는 레이저의 방사각을 고려할 때 최종 5도의 범위를 가진다. 카메라(400)는 일반적으로 표준화각을 기준으로하면 50도 정도로 LiDAR와 수평시야각이 맞지 않는데 이를 해결하기 위해서 120도 범위를 가지는 광각 렌즈를 적용하여 영상과 LiDAR 간 시야각을 일치화한다. 두 센서 신호는 30fps의 처리 속도를 가질 수 있으며 이는 임베디드 환경에서 신호를 처리된다. 임베디드 시스템은 엔베디아의 Jetson TX2 보드 환경에서 동작하며 LiDAR는 이더넷 통신을 통해 신호를 입력받고 영상은 USB3.0 통신을 통해서 입력받을 수 있다.The LiDAR according to an embodiment of the present invention may be, for example, a LiDAR that acquires an optical signal through an APD by irradiating a pulse laser of 905 nm band with a 4-channel LiDAR system and converts it into a distance value. In the case of the scanning optical system, a flat mirror rotation optical system is applied, and through this, a maximum horizontal viewing angle of 120 degrees is secured. The vertical resolution has a final range of 5 degrees when considering the radiation angle of the laser. The camera 400 generally does not match the horizontal viewing angle with the LiDAR by about 50 degrees based on the standard angle. To solve this, a wide-angle lens having a range of 120 degrees is applied to match the viewing angle between the image and the LiDAR. Both sensor signals can have a processing speed of 30 fps, which is processed signals in an embedded environment. Embedded system operates in Nvidia's Jetson TX2 board environment, and LiDAR receives signals through Ethernet communication and images through USB3.0 communication.

LiDAR와 카메라 데이터는 서로 다른 공간의 데이터로 한 공간으로 통합해야 한다. 이를 위해 두 센서의 수학적 관계를 정의한다. 카메라의 공간과 LiDAR의 공간을 동기화 하기 위해서 LiDAR가 바라보는 점 데이터가 영상의 픽셀에서 찍히는 점들을 검출하여 두 센서간의 선형적 수학 관계를 정립한다. 두 센서의 좌표계 중 카메라의 좌표계를 기준으로 회전행렬과 이동행렬을 추정하고 카메라의 내부인자에 해당하는 초정거리와 주점 값을 추정함을 통해 LiDAR의 한 점과 영상의 한 픽셀을 일치화한다. 융합된 데이터로부터 다중 객체를 검출하고 인지하는 방법은, 우선 LiDAR를 통해 획득된 3차원 점 데이터는 한 프레임 당 한 채널에 900개의 점 데이터가 획득되는데, 획득된 점 데이터와 영상의 픽셀 간 데이터 융합을 일대일로 진행 할 경우 데이터 처리에 부담이 증가한다. 신호처리를 최적화 하기 위해 점 데이터에서 검출된 객체를 군집화를 선행해야 한다.LiDAR and camera data must be integrated into one space as data from different spaces. To this end, a mathematical relationship between the two sensors is defined. In order to synchronize the space of the camera and the space of the LiDAR, the point data that the LiDAR is looking at detects the points that are taken from the pixels of the image to establish a linear mathematical relationship between the two sensors. Among the coordinate systems of the two sensors, one point of LiDAR and one pixel of the image are matched by estimating the rotation matrix and movement matrix based on the camera's coordinate system, and estimating the primary distance and main point values corresponding to the internal factors of the camera. In the method of detecting and recognizing multiple objects from fused data, 900 point data is acquired in one channel per frame for 3D point data acquired through LiDAR. In case of one-to-one operation, the burden on data processing increases. In order to optimize the signal processing, it is necessary to cluster the objects detected in the point data first.

군집화된 점 데이터의 중심 값 만을 이미지 공간으로 변환하고 대응되는 픽셀을 기준으로 인지를 진행할 수 있다. 이때 객체를 인지하기 위해서 딥러닝 합습법 중 하나인 YOLO 알고리즘을 적용할 수 있다. YOLO 알고리즘은 영상의 범위를 그리드로 나눈뒤 확률이 가장 높은 클래스를 선택하는 방식이다. 해당 알고리즘은 영상에서 획득된 객체가 작아질 경우, 즉 획득되는 픽셀데이터의 양이 작아질 경우 정확하게 객체를 인지하기 어렵다. 또한 딥러닝의 기법을 적용하여 인지 정확도가 매우 높아지기는 하였으나 신뢰성 부분에는 의문이 존재하기 때문에 영상만을 적용하기에는 한계가 있다. Only the center value of the clustered point data is converted into the image space, and recognition can be performed based on the corresponding pixel. At this time, the YOLO algorithm, one of the deep learning methods, can be applied to recognize the object. The YOLO algorithm divides the image range into a grid and then selects the class with the highest probability. It is difficult for the algorithm to accurately recognize an object when an object obtained from an image becomes smaller, that is, when the amount of acquired pixel data becomes smaller. In addition, although the recognition accuracy is very high by applying the deep learning technique, there is a limit to applying only images because there are questions about the reliability.

이에 따라, 본 발명의 일 실시예에서는, LiDAR를 통해 획득된 영역을 중심으로 객체 인지를 진행하므로 일반적인 영상이 가지는 한계를 해결할 수 있다. 또한 검출부분은 추정 알고리즘이 물리적으로 측정한 값을 기준으로 하므로 신뢰성을 확보할 수 있다. 결과적으로, 두 이종 데이터는 한 공간으로 대응하며 LiDAR를 통해 획득된 위치를 중심으로 하여 객체를 인지할 수 있다. 상술한 방법은 LiDAR와 카메라(400)를 이용하여 다중객체를 인지하는 방법이며, 각 가정에서 또는 반려동물 돌봄 서비스를 제공하는 각 소상공인이 손 쉽게 구매하고 쓸 수 있도록 저가형 LiDAR와 카메라의 융합을 통해 다중객체를 인지할 수 있다. 이종센서가 가지는 장점을 취함을 통해 인지의 정확도를 높이고 시스템의 신뢰성을 높일 수 있다.Accordingly, in an embodiment of the present invention, object recognition is performed based on a region acquired through LiDAR, so that a limitation of a general image can be solved. In addition, since the detection part is based on the value physically measured by the estimation algorithm, reliability can be secured. As a result, the two heterogeneous data correspond to one space, and the object can be recognized based on the position obtained through LiDAR. The above-described method is a method of recognizing multiple objects using LiDAR and the camera 400, and through the convergence of low-cost LiDAR and camera so that each small business owner can easily purchase and use it at each home or providing companion animal care services. It can recognize multiple objects. By taking the advantages of heterogeneous sensors, it is possible to increase the recognition accuracy and increase the reliability of the system.

<병렬화를 이용한 다중객체 트래킹><Multi-object tracking using parallelization>

대부분 GPU를 이용하여 딥러닝이나 실시간 객체 추적 또는 자동 추적을 실시하고 있으나, 본 발명의 일 실시예에서는, GPU의 활용 없이 CPU 병렬화만으로 향상된 속도의 다중객체 트래커를 구현하기로 한다. 다중객체 트래킹을 위해 색상, 속도, 크기를 기반으로 각 객체에 대한 모델을 생성하고 반려동물 관찰 영상에 특화된 기술을 적용하기 위해 전체 객체의 움직임을 기반으로 각 객체의 움직임을 예측할 수 있도록 한다. 반려동물의 행동이나 패턴은 시간에 따라 계속해서 바뀌고, 객체의 형태나 색상 또한 객체의 위치에 따라 계속해서 변화하므로 최초의 모델을 학습을 통해 갱신하되, 반려동물 간 겹침과 같은 간섭에 강인함을 가지기 위해 학습 속도를 느리게 설정할 수 있다. 트래킹 속도 성능 향상을 위해 각 객체는 현재 위치를 중심으로 가로, 세로 N 배 크기의 윈도우 내에서 이동하는 것을 가정하고 해당 범위 내에서만 객체 추적을 시행하여 불필요한 연산을 하지 않도록 설계할 수 있다.Most of the deep learning or real-time object tracking or automatic tracking is performed using GPU, but in an embodiment of the present invention, a multi-object tracker with improved speed is implemented only by CPU parallelization without utilizing GPU. For multi-object tracking, a model for each object is created based on color, speed, and size, and the movement of each object can be predicted based on the movement of the entire object in order to apply the specialized technology to the companion animal observation image. The behavior or pattern of companion animals continuously changes with time, and the shape and color of objects also change according to the location of the object. You can set the learning rate to be slow for this purpose. To improve tracking speed performance, it is assumed that each object moves within a window of N times the width and length of the current location, and object tracking is executed only within the corresponding range, so that unnecessary calculations can be avoided.

반려동물 돌봄 서비스를 제공하는 애견카페, 애견호텔, 애견운동장 내의 객체 모델 업데이트 및 추적은 각각 독립적으로 수행이 가능하여 병렬화를 통해 정확도성능을 잃지 않고도 높은 속도 성능향상이 가능하다. 다만 전체 객체 움직임을 통한 예측은 모든 객체 추적이 종료된 이후에 가능하므로 병렬화의 대상이 되지 않는다. 고성능의 GPU를 사용한 병렬화를 사용하는 경우 트래킹 속도는 빨라질 수 있으나 트래킹 서버의 가격이 상승해 소상공인 업소에 적합하지 않다. 이에 따라, 본 발명의 일 실시예에서는, 각 가정이나 돌봄 서비스를 제공하는 각 소상공인 사업장에 적용이 가능하도록 CPU만을 사용하여 촬영 영역 내 복수의 반려동물을 추적할 수 있도록 한다. 병렬화를 위하여 OpenMP API를 이용할 수 있고, 영상처리 작업 및 비교실험에는 OpenCV를 이용할 수 있다.Updating and tracking object models in dog cafes, dog hotels, and dog playgrounds that provide companion animal care services can be performed independently, enabling high speed performance improvement without loss of accuracy performance through parallelization. However, prediction through the entire object movement is possible after all object tracking is finished, so it is not subject to parallelization. If parallelization using a high-performance GPU is used, the tracking speed can be increased, but the price of the tracking server rises, so it is not suitable for small businesses. Accordingly, in one embodiment of the present invention, it is possible to track a plurality of companion animals in the shooting area using only the CPU so that it can be applied to each home or each small business establishment that provides a care service. OpenMP API can be used for parallelization, and OpenCV can be used for image processing tasks and comparative experiments.

애견카페, 애견운동장, 애견호텔 등에서는 서로 다른 크기의 반려동물이 섞여 있게 되므로 관리자가 조금이라도 한 눈을 팔면 개물림 사고로 이어지며, 개의 경우 개의 체급이 다른 경우, 체급이 낮은 개는 몇 초 이내에 죽게되는 불상사로 이어진다. 개는 입을 손으로 사용하는데, 개의 특성상 먹이의 숨통을 끊을 때까지(피식자의 움직임이 정지될 때까지) 피식자를 물어서 이빨을 가죽 아래로 통과시킨 후, 흔들고 찢는 행동이 본능으로 남아있기 때문에, 상처 부위는 급소일 수 밖에 없고, 이를 물고 흔들 경우 장기가 뚫려 결국엔 폐사하거나 급소가 아닌 부위를 물더라도 해당 부위가 찢어지는 등 큰 부상으로 이어진다. 이런 이유로 관리자는 잠시 동안이라도 화장실을 갈 수도 없고 밥을 먹을 수도 없어서 번갈아 교대를 해야 하기 때문에 인건비 상승이나 재산상의 손해에 대한 민사소송(개는 민법상 재산으로 취급)을 당하기 때문에 가게도 폐업하는 경우가 많다. In dog cafes, dog parks, dog hotels, etc., pets of different sizes are mixed, so even a slight glance by the manager can lead to a dog bite accident. It leads to an untimely death that occurs within seconds. Dogs use their mouths with their hands, but due to the nature of the dog, until the prey stops breathing (until the movement of the prey stops), it bites the prey and passes its teeth under the hide, then shaking and tearing remains as an instinct. The part is inevitably a vital point, and if you bite and shake it, the organ will puncture and eventually die, or even if you bite a part that is not a vital part, it leads to serious injuries, such as tearing the part. For this reason, the manager cannot go to the bathroom or eat even for a short period of time, so he has to take turns. many.

이에, 우선 강아지들이나 고양이가 다닐 수 있는 공간 내 객체 좌표 추적을 위한 공간 영역 지정 및 초기 객체 영역 지정을 할 수 있으며, 트래킹 오류 보정을 위해 실시간 및 일시정지 상황에서 트래킹 영역 재지정을 할 수 있다. 또한 급격한 환경 변화 혹은 잘못된 학습으로 인한 모델 변화에 대응하기 위해 모델 초기화 기능을 더 포함할 수 있다. 다묘 또는 다견 가정이나 애견 운동장이나 카페 내의 다중객체를 실시간으로 분석하기 위해서는 각 객체를 실시간으로 추적할 수 있고, 이를 위해 CPU 병렬화 API인 OpenMP를 활용하여 실시간으로 다중객체를 높은 정확도로 추적할 수 있다.Therefore, first, it is possible to designate a spatial area for tracking object coordinates in a space that dogs or cats can travel and to designate an initial object area, and to re-designate the tracking area in real-time and pause situations to correct tracking errors. In addition, a model initialization function may be further included in order to respond to a model change due to a sudden environmental change or incorrect learning. In order to analyze multiple objects in real-time in multi-cat or multi-dog households or dog playgrounds or cafes, each object can be tracked in real time. .

전송부(330)는, 식별 및 추적한 결과 데이터를 영상 데이터와 함께 전송할 수 있다. 영상 데이터는 소리 데이터를 포함할 수 있다. 이에 따라, 안면 데이터로부터 표정 데이터를, 몸체 데이터로부터 행동 데이터를, 영상 데이터로부터 소리 데이터를 추출할 수 있고, 이를 분석할 수 있게 된다. 이때, 각 데이터를 수집하여 반려동물을 식별하고, 각 반려동물의 행동이나 표정 및 소리에 따라 반려동물의 상태 정보를 추출하는 방법은 본 출원인의 선등록특허 제10-2176934호(2020년11월10일 공고)에 개시되어 있으므로 상세한 설명은 생략하기로 한다.The transmitter 330 may transmit identification and tracking result data together with image data. The image data may include sound data. Accordingly, it is possible to extract expression data from facial data, behavior data from body data, and sound data from image data, and analyze them. At this time, the method of collecting each data to identify companion animals and extracting status information of companion animals according to the behaviors, expressions, and sounds of each companion animal is described in the Applicant's Pre-Registered Patent No. 10-2176934 (November 2020). Notice on the 10th), so a detailed description will be omitted.

이때, 선등록특허에 개시되지 않은 사항은 소리 데이터로 분석을 하는 경우인데, 소리 데이터는 강아지나 고양이의 상태를 알 수 있는 중요한 요소이다. 이를 분석하기 위하여, 기 저장된 인공지능 알고리즘은, 적어도 하나의 견종 또는 묘종 리스트에 포함된 적어도 하나의 반려동물의 정상 상태의 소리 데이터 및 진동 데이터와, 적어도 하나의 반려동물의 비정상 상태의 종류별 소리 데이터 및 진동 데이터를 학습 데이터 셋(Data Set) 및 훈련 데이터 셋으로 하여 모델링된 알고리즘일 수 있다.At this time, a matter not disclosed in the prior patent is a case of analyzing sound data, which is an important element for knowing the state of a dog or cat. In order to analyze this, the pre-stored artificial intelligence algorithm includes sound data and vibration data of at least one companion animal in a normal state included in the list of at least one dog breed or cat breed, and sound data for each type of abnormal state of the at least one companion animal. And it may be an algorithm modeled by using vibration data as a training data set and a training data set.

이를 위하여, 전송부(330)는, 첫 번째로, 수집된 소리 데이터 및 진동 데이터에 대하여 노이즈 제거를 포함한 전처리를 수행할 수 있다. 노이즈 제거가 되지 않은 경우, 데이터 의존성이 큰 데이터 분석법의 결과에 큰 영향을 미치게 되어 결국 결과가 무의미한 지경에 이르기 때문이다. 이때, 노이즈를 전송부(330)에서 이미 제거한 경우에는 이 과정은 스킵할 수 있다. 두 번째로, 전송부(330)는, 전처리 후 수집된 소리 데이터에 대하여 선택된 반려동물에 기 매핑되어 저장된 가중치를 적용할 수 있다. 이때의 가중치는 인공지능 알고리즘을 모델링하면서 레이블과의 오차가 가장 적은 경우의 가중치로, 각각의 반려동물의 종류 등의 식별자에 매핑되어 저장되어 있다. 이에 따라, 사용자 단말(100)에서 반려동물의 종류 및 품종을 검색하거나 종류를 선택하는 경우, 해당 식별자가 추출되고, 식별자와 매핑된 가중치가 추출될 수 있다. To this end, first, the transmitter 330 may perform pre-processing including noise removal on the collected sound data and vibration data. This is because, if noise is not removed, the results of data analysis methods, which are highly dependent on data, will be greatly affected, and eventually the results will be meaningless. In this case, when the noise has already been removed by the transmitter 330, this process may be skipped. Second, the transmitter 330 may apply a pre-mapped and stored weight to the selected companion animal with respect to the sound data collected after pre-processing. At this time, the weight is the weight when the error with the label is the smallest while modeling the artificial intelligence algorithm, and is mapped to an identifier such as the type of each companion animal and stored. Accordingly, when the user terminal 100 searches for the type and breed of the companion animal or selects the type, a corresponding identifier may be extracted and a weight mapped to the identifier may be extracted.

세 번째로, 전송부(330)는, 수집된 소리 데이터를 STFT(Short-Time Fourier Trasnform)을 이용하여 스펙트로그램(Spectrogram) 이미지로 변환할 수 있다. 이때, 푸리에 변환(Fourier Transform)의 비정상적인 신호에 대한 주파수 분석 기법의 오차가 생기는 문제점이 발생하고, 이와 같은 오차가 발생하는 순간에도 주파수 분석을 할 수 있는 방법이 필요하다. 이에, STFT는, 신호의 안정성을 고려하여 대상 신호를 프레임 단위로 나누어 일정한 크기의 창을 움직이면서 푸리에 변환을 하는 방법인데, 분석하고자 하는 신호에 윈도우(Window) 함수를 적용한 후 푸리에 변환을 수행하는 것으로 시간 영역에서의 STFT의 수식은 STFT(t,w)=∫x(τ-t)e-^jwτ dτ이며, 인테그랄의 범위는 - 무한대에서 + 무한대까지이다. 여기서 x(τ)는 분석하고자 하는 신호이고, w(τ-t)는 윈도우 함수이다. 또, 스펙트로그램(Spectrogram)은, 소리나 파동을 시각화하여 파악하기 위한 도구로, 파형(Waveform)과 스펙트럼(Spectrum)의 특징이 조합되어 있다. 파형에서는 시간축의 변화, 즉 시간 도메인에서의 진폭축의 변화를 볼 수 있고, 스펙트럼에서는 주파수축의 변화에 따른 진폭축의 변화를 볼 수 있는 반면, 스펙트로그램에서는 시간축과 주파수축의 변화에 따라 진폭의 차이를 인쇄 농도나 표시 색상의 차이로 나타낸다.Third, the transmitter 330 may convert the collected sound data into a spectrogram image using Short-Time Fourier Trasnform (STFT). In this case, there is a problem in that an error occurs in the frequency analysis technique for an abnormal signal of the Fourier transform, and a method is needed to perform frequency analysis even at the moment when such an error occurs. Accordingly, STFT is a method of dividing a target signal into frame units in consideration of signal stability and performing Fourier transform while moving a window of a certain size. The formula for STFT in the time domain is STFT(t,w)=∫x(τ-t)e- ^jwτ dτ, and the integral range is from -infinity to +infinity. Here, x(τ) is a signal to be analyzed, and w(τ-t) is a window function. In addition, a spectrogram is a tool for visualizing and grasping a sound or a wave, and the characteristics of a waveform and a spectrum are combined. In the waveform, you can see the change in the time axis, that is, the change in the amplitude axis in the time domain. In the spectrum, you can see the change in the amplitude axis according to the change in the frequency axis. In the spectrogram, the difference in amplitude according to the change in the time and frequency axes is printed It is expressed as a difference in density or display color.

네 번째로, 전송부(330)는, 변환된 스펙트로그램 이미지를 CNN(Convolutional Neural Network)를 이용하여 특징 데이터를 추출할 수 있다. 스펙트로그램이 이미지로 출력되기 때문에, 결과적으로 이미지 분석을 통하여 사용자가 녹음한 소리 데이터를, 기 저장된 소리 데이터와 비교를 할 때, 이미지를 이용하여 비교해야 한다. 이때, 서로 다른 이미지 간에 유사도를 계산하고자 할 때 픽셀 단위의 비교를 통해서는 이미지 사이의 유사한 정도를 판단하기 어렵기 때문에, 이러한 문제를 해결하기 위해 이미지를 픽셀 단위로 비교하는 것이 아닌 이미지의 특징을 추출하여 이미지를 비교하는 방법이 존재하는데, 바로 딥러닝 기술, CNN이다. CNN은, 스스로 이미지의 특징을 학습하고 학습된 특징을 바탕으로 이미지의 분류 및 이미지의 패턴을 인식할 수 있고, 컨볼루션 층을 이용한 네트워크 구조로 이미지 처리에 적합하며, 이미지 데이터를 입력으로 하여 이미지 내의 특징을 기반으로 이미지를 분류할 수 있기 때문에 본 발명의 일 실시예에서 이용하도록 한다. 다만, CNN에 한정되는 것은 아니고 이미지를 구분하고 그 특징을 추출할 수 있는 방법이라면 그 어떠한 것이든 가능하다 할 것이다. Fourth, the transmitter 330 may extract feature data from the transformed spectrogram image using a Convolutional Neural Network (CNN). Since the spectrogram is output as an image, as a result, when comparing sound data recorded by a user through image analysis with pre-stored sound data, the image should be used for comparison. At this time, when trying to calculate the similarity between different images, it is difficult to determine the degree of similarity between images through pixel-by-pixel comparison. There is a method of extracting and comparing images, which is deep learning technology, CNN. CNN learns the characteristics of images by itself, can classify images and recognize patterns of images based on the learned characteristics, and is suitable for image processing with a network structure using a convolutional layer. Since an image can be classified based on the features in it, it is used in an embodiment of the present invention. However, it is not limited to CNN, and any method that can distinguish an image and extract its features is possible.

CNN은, 스펙트로그램의 특징을 벡터 형태로 추출하고 추출된 특징을 이용하여 이미지 간 유사도를 측정할 수 있다. 스펙트로그램의 특징을 추출하기 위해 본 발명의 일 실시예에서는, 컨볼루션(Convolution) 레이어, 활성 함수(Activation Function) 레이어, 최대 풀링(Max Pooling) 레이어로 구성된 합성곱 신경망의 기본 구조를 이용한 모델을 이용할 수 있고, 유사도를 측정하기 이전, 이미지를 분류하기 위해사용되는 소프트맥스 레이어(Softmax Layer) 이전의 레이어로부터 스펙트로그램의 특징 벡터를 추출하여 사용할 수 있다. 기본적으로 전결합 레이어(Fully-Connected Layer)를 가지는 CNN과, 특징 맵(Feature Map) 상의 평균값을 가져오는 GAP 레이어(Global Average Layer)를 가지는 모델로부터 특징을 추출하여 유사도를 측정하는 데 사용할 수 있다. 이미지의 유사도를 측정하기 위한 또 다른 모델로 CNN 기반의 오토인코더 모델을 이용할 수도 있는데, 인코더(Encoder)는 컨볼루션 신경망 구조로 구성되어 있고, 인코더의 결과로 압축된 데이터를 다시 재구성하기 위한 디코더(Decoder)를 포함할 수 있다. 학습된 오토인코더 모델의 특정 레이어로부터 스펙트로그램의 특징을 추출하고, 이를 다시 GAP 레이어를 통해 나온 특징 벡터를 이미지 유사도 측정에 사용할 수 있다. 위 세 가지 모델을 통해 추출된 이미지 특징 벡터로부터 유클리디안거리(Euclidean Distance) 및 코사인 유사도(Cosine Similarity)를 측정할 수 있고, 이를 이용하여 스펙트로그램 별 유사 정도, 즉 유사도에 따라 정렬할 수 있으며, 정렬된 순서를 이용하여 스펙트로그램 별 가장 유사하다고 판단되는 데이터뿐만 아니라 가장 유사하지 않다고 판단되는 데이터까지 확인할 수 있다.CNN extracts the features of the spectrogram in the form of vectors and can measure the similarity between images by using the extracted features. In one embodiment of the present invention in order to extract the features of the spectrogram, a model using the basic structure of a convolutional neural network consisting of a convolution layer, an activation function layer, and a max pooling layer is used. It can be used, and before measuring the similarity, it is possible to extract and use the feature vector of the spectrogram from the layer before the Softmax layer used to classify the image. Basically, it can be used to measure similarity by extracting features from a model having a CNN having a fully-connected layer and a global average layer (GAP) that brings average values on a feature map. . As another model for measuring the similarity of images, a CNN-based autoencoder model can be used. The encoder consists of a convolutional neural network structure, and a decoder ( Decoder) may be included. Spectrogram features are extracted from a specific layer of the learned autoencoder model, and the feature vector generated through the GAP layer can be used to measure image similarity. Euclidean distance and cosine similarity can be measured from the image feature vectors extracted through the above three models, and using these, the degree of similarity for each spectrogram, i.e., can be sorted according to the degree of similarity. .

소리 데이터에 대한 특징 데이터가 추출되었으면, 그 다음은 진동 데이터에 대한 특징 데이터를 추출해야 하는데, 다섯 번째로, 전송부(330)는, 수집된 진동 데이터를 시간 도메인(Time Domain)에서 XYZ축에 대하여 재구조화를 수행한 후, CNN을 이용하여 특징 데이터를 추출할 수 있다. 이에 대한 기본 개념은 시간축(시간 도메인)에 대한 추의 이동의 원리로 회귀한다. 용수철에 달린 추는 시간이 지남에 따라 상하로 운동하게 되고, 중심점을 기준으로 상한점과 하한점을 왕복하면서 그래프를 남긴다. 이와 마찬가지로, 동물들의 소리는 그르렁 거리는 소리나 낮게 짖는 소리 등 진동 데이터가 포함될 수 있다. 이때, 지자기센서나 9축 센서, 및 가속도 센서 등을 이용하는 경우 XYZ축 중 어느 방향으로 진동이 가해지는지, 그 세기는 어느 정도인지, 그 패턴은 어떻게 되는지에 대한 데이터 생성 및 그래프 시각화가 가능해진다. 이렇게 생성된 그래프 또한 이미지이므로, 상술한 CNN을 이용하여 전송부(330)는 진동 데이터에 대한 특징 데이터를 추출할 수 있다.After the feature data for the sound data has been extracted, it is necessary to extract the feature data for the vibration data. Fifth, the transmitter 330 transmits the collected vibration data to the XYZ axis in the time domain. After performing the restructuring, the feature data can be extracted using CNN. The basic concept for this returns to the principle of pendulum movement with respect to the time axis (time domain). The weight on the spring moves up and down over time, leaving a graph while moving back and forth between the upper and lower limits based on the central point. Likewise, animal sounds may include vibration data, such as growls or low barking sounds. At this time, when using a geomagnetic sensor, a 9-axis sensor, an acceleration sensor, etc., data generation and graph visualization of which direction of the XYZ axis vibration is applied, how strong the vibration is, and how the pattern is made becomes possible. Since the generated graph is also an image, the transmitter 330 may extract feature data for the vibration data using the CNN described above.

이렇게측정된 소리 데이터 및 진동 데이터에 대한 특징 데이터가 추출된 후에는, 여섯 번째로 전송부(330)는, 소리 데이터 및 진동 데이터로부터 추출된 특징 데이터를 기 저장된 선택된 반려동물의 특징 데이터와 비교할 수 있다. 적어도 하나의 반려동물별로, 정상 또는 비정상에 대한 소리 데이터 및 진동 데이터의 특징 데이터가 이미 학습 및 저장되어 있고, 비정상인 경우에도, 비정상의 종류에 따라, 예를 들어, 강아지가 스트레스를 받아 강박행동을 하는 경우인지, 배가 고프거나 주인이 보고 싶어서 낑낑대는 경우인지, 배가 고파서 내는 경우인지, 외부 소음에 민감하여 불안성으로 짖는 경우인지 등에 따라 소리 데이터 및 진동 데이터의 특징 데이터가 이미 학습 및 저장되어 있다. 따라서, 미리 학습된 특징 데이터와 측정한 소리 데이터 및 진동 데이터의 특징 데이터를 비교하는 경우, 정상인지 비정상인지, 비정상이라면 어떠한 경우에 속하는 비정상인지를 확인할 수 있게 된다. 이때, 특징 데이터 비교는 상술한 바와 같이 유클라디안 거리를 이용할 수 있음은 상술한 바와 같다. After the characteristic data for the measured sound data and vibration data is extracted, sixthly, the transmitter 330 may compare the characteristic data extracted from the sound data and the vibration data with the previously stored characteristic data of the selected companion animal. have. For at least one companion animal, characteristic data of sound data and vibration data for normal or abnormal are already learned and stored, and even if it is abnormal, depending on the type of abnormality, for example, the dog is stressed and compulsive behavior Characteristic data of sound data and vibration data has already been learned and stored depending on whether it is a case of snoring, whether it is a case of being hungry or whining to see the owner, whether it is a case of being hungry, or a case of barking with anxiety due to sensitivity to external noise. have. Accordingly, when comparing the pre-learned characteristic data with the characteristic data of the measured sound data and vibration data, it is possible to determine whether the characteristic data is normal or abnormal, and in which case the abnormality belongs. In this case, the feature data comparison can use the Euclidean distance as described above as described above.

마지막으로, 전송부(330)는, 비교 결과 유사도가 최고치인 기 저장된 특징 데이터를 선택할 수 있다. 이렇게 유사도가 최고치인 기 저장된 특징 데이터를 선택하면, 이를 이용하여 결과 리포트를 생성하고 사용자 단말(100)로 전송할 수 있다. 이미 기 저장된 특징 데이터에는, 정상 또는 비정상 여부, 비정상인 경우 비정상의 원인, 비정상의 종류, 및 해결법이 미리 매핑되어 저장되어 있기 때문이다. 이때, 사람이 개입해야 하는 경우 전송부(330)는 사용자 단말(100)에서 미리 지정한 펫시터 리스트 중 현재 펫시팅이 가능한 펫시터에게 연락을 하고, IoT 기기(500)로 대문이나 현관문을 열어주어 펫시터가 반려동물을 돌보게 할 수도 있다.Finally, the transmitter 330 may select pre-stored feature data having the highest similarity as a result of the comparison. If pre-stored feature data having the highest similarity is selected in this way, a result report may be generated using the selected feature data and transmitted to the user terminal 100 . This is because, in the previously stored feature data, whether it is normal or abnormal, the cause of the abnormality in the case of abnormality, the type of abnormality, and a solution are mapped and stored in advance. At this time, when a human intervention is required, the transmitter 330 contacts a pet sitter currently capable of pet sitting from the list of pet sitters designated in advance by the user terminal 100 , and opens the gate or front door with the IoT device 500 . You can also have a pet sitter take care of your pet.

학습을 하기 위해서는, 정상 상태의 소리 데이터 및 비정상 상태의 종류별 소리 데이터를 STFT(Short-Time Fourier Trasnform)을 이용하여 스펙트로그램(Spectrogram) 이미지로 변환하고, 변환된 스펙트로그램 이미지를 CNN(Convolutional Neural Network)를 이용하여 특징 데이터를 추출할 수 있다. 정상 상태의 진동 데이터 및 비정상 상태의 종류별 진동 데이터를 시간 도메인(Time Domain)에서 XYZ축에 대하여 재구조화를 수행한 후, CNN을 이용하여 특징 데이터를 추출할 수 있다. 여기서, 수집된 정상 상태의 소리 데이터 및 진동 데이터와, 비정상 상태의 종류별 소리 데이터 및 진동 데이터로부터 특징 데이터를 추출할 때, 추출된 특징 데이터를 LSTM(Long Short-Term Memory)에 기반하여 특징 데이터를 재추출할 수 있다.In order to learn, the sound data of the normal state and the sound data of each type of the abnormal state are converted into a spectrogram image using STFT (Short-Time Fourier Trasnform), and the converted spectrogram image is converted into a Convolutional Neural Network (CNN). ) can be used to extract feature data. After restructuring the vibration data of the steady state and the vibration data of each type of abnormal state in the time domain with respect to the XYZ axis, the feature data may be extracted using CNN. Here, when extracting feature data from the collected sound data and vibration data in a steady state and sound data and vibration data for each type of abnormal state, the extracted feature data is used based on LSTM (Long Short-Term Memory). can be re-extracted.

태깅부(340)는, 안면 데이터로부터 추출한 표정 데이터, 몸체 데이터로부터 추출한 행동패턴 데이터에 대응하는 태그를 사용자 단말(100)로 전송할 수 있다. 배고픔, 우울, 스트레스, 지루함, 배고픔, 목마름, 기쁨, 슬픔 등 사람은 얼굴 표정에 드러날 수 있지만, 동물은 행동으로도 표현한다. 스트레스를 받는 경우 자기 꼬리를 물거나 신체 일부를 물어뜯는 행동을 하는 등 행동 패턴으로도 상태를 파악할 수 있다. 최근 강아지가 보는 유튜브라는 채널이 다양하게 발달되어 있는데, 지루하다면 호기심을 자극할 수 있는 다양한 소리를 내는 동영상을, 현재 불안하다면 스트레스를 풀게하는 자연소리나 백색소음의 동영상을 틀어줄 수도 있다. 또, 너무 어두운 경우 IoT 기기(500)가 조명이라면 조명을 켜 줄 수도 있고, 환기가 부족하다면 창문에 IoT 기기(500)로 자동개폐되는 창문인 경우 이를 오픈시켜줄 수도 있다. 그리고 IoT 기기(500)가 반려동물 돌봄 로봇이라면 강아지나 고양이에게 흥미를 끌 만한 경로로 이동하거나 로봇 몸통을 흔들거나 특정한 패턴으로 이동하는 등으로 지루함을 달래줄 수도 있다.The tagging unit 340 may transmit a tag corresponding to the facial expression data extracted from the facial data and the behavior pattern data extracted from the body data to the user terminal 100 . Hunger, depression, stress, boredom, hunger, thirst, joy, sadness, etc. can be expressed in facial expressions, but animals also express them in actions. When you are stressed, you can also determine the condition by behavioral patterns, such as biting your own tail or biting parts of your body. Recently, a channel called YouTube that dogs watch has been developed in various ways. If you are bored, you can play a video that makes various sounds to stimulate curiosity, or if you are currently anxious, you can play a video of natural sounds or white noise that relieves stress. In addition, if it is too dark, the light may be turned on if the IoT device 500 is a light, and if ventilation is insufficient, the window may be opened if the window is automatically opened and closed by the IoT device 500 . In addition, if the IoT device 500 is a companion animal care robot, it can relieve boredom by moving along a path that may be of interest to a dog or cat, shaking the robot body, or moving in a specific pattern.

IoT 제어부(350)는, 카메라(400)로부터 수집된 영상 데이터를 분석한 결과 이상행동이 탐지되는 경우, 이상행동에 기 매핑되어 저장된 제어 프로세스를 진행할 수 있다. 이때, IoT 제어부(350)는 카메라(400)가 연결된 홈 네트워크를 통하여 연동되는 적어도 하나의 IoT 기기(500)의 턴 온 또는 턴 오프를 제어할 수 있다.When an abnormal behavior is detected as a result of analyzing the image data collected from the camera 400, the IoT controller 350 may perform a control process previously mapped to the abnormal behavior and stored. In this case, the IoT controller 350 may control the turn-on or turn-off of at least one IoT device 500 that is linked through the home network to which the camera 400 is connected.

이하, 상술한 도 2의 인지 서비스 제공 서버의 구성에 따른 동작 과정을 도 3을 예로 들어 상세히 설명하기로 한다. 다만, 실시예는 본 발명의 다양한 실시예 중 어느 하나일 뿐, 이에 한정되지 않음은 자명하다 할 것이다.Hereinafter, an operation process according to the configuration of the cognitive service providing server of FIG. 2 will be described in detail with reference to FIG. 3 as an example. However, it will be apparent that the embodiment is only one of various embodiments of the present invention and is not limited thereto.

도 3을 참조하면, (a) 인지 서비스 제공 서버(300)는 반려동물의 종류에 따라 다양한 정보를 수집하고, 이를 안면, 신체, 소리로 나누고, 안면으로부터 표정, 신체(몸체)로부터 행동 등으로 나누어 안면 데이터, 행동 데이터 및 소리 데이터로 분류하고, 이를 각각 안면인식, 행동패턴 분석, 소리 분석에 사용될 수 있도록 훈련 및 테스트를 거쳐 인공지능 알고리즘을 완성시킨다. 그리고, (b) 인지 서비스 제공 서버(300)는 사용자 단말(100)로부터 수집된 등록 정보와 카메라(400)를 사용자 단말(100)과 매핑하여 저장하고, 카메라(400)에서 촬영을 시작하면 실시간으로 이를 수집하여 분석하기 시작한다.Referring to FIG. 3 , (a) the cognitive service providing server 300 collects various information according to the type of companion animal, divides it into a face, a body, and a sound, and converts it into an expression from a face, an action from a body, etc. It is divided into facial data, behavior data, and sound data, and the AI algorithm is completed through training and testing so that it can be used for facial recognition, behavior pattern analysis, and sound analysis, respectively. And, (b) the cognitive service providing server 300 maps and stores the registration information collected from the user terminal 100 and the camera 400 with the user terminal 100, and starts shooting in the camera 400 in real time to collect and analyze it.

그리고, (c) 및 (d)를 참조하면, 인지 서비스 제공 서버(300)는 분석 결과를 토대로 사용자 단말(100)에게 각 식별된 객체를 인지한 결과를 전송해주고, 도 4의 (a)와 같이 분석된 상태에 따라 IoT 기기(500)를 제어할 수 있다. (b)와 같이 만약 모델링된 인공지능 알고리즘이 잘못 분석한 경우, 예를 들어, 피드백을 사용자 단말(100)로 받은 경우에는 관리자 단말(미도시)에서 명백한 오류인지 또는 사용자의 인식 오류인지를 파악한 다음, 명백한 오류라면 모델링된 인공지능 알고리즘을 피드백을 새로운 입력값으로 하여 재학습시킬 수도 있다.And, referring to (c) and (d), the cognitive service providing server 300 transmits a result of recognizing each identified object to the user terminal 100 based on the analysis result, and The IoT device 500 may be controlled according to the analyzed state. As shown in (b), if the modeled artificial intelligence algorithm erroneously analyzed, for example, if feedback was received by the user terminal 100, the manager terminal (not shown) identified whether it was an obvious error or a user's recognition error. Next, if there is an obvious error, the modeled AI algorithm can be retrained with the feedback as a new input value.

이와 같은 도 2 내지 도 4의 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1을 통해 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters not described for the method for providing multi-object recognition service for companion animals using the camera of FIGS. 2 to 4 are the same as those described for the method for providing multi-object recognition service for companion animals using the camera in FIG. 1 , or Since it can be easily inferred from the described content, the following description will be omitted.

도 5는 본 발명의 일 실시예에 따른 도 1의 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 시스템에 포함된 각 구성들 상호 간에 데이터가 송수신되는 과정을 나타낸 도면이다. 이하, 도 5를 통해 각 구성들 상호간에 데이터가 송수신되는 과정의 일 예를 설명할 것이나, 이와 같은 실시예로 본원이 한정 해석되는 것은 아니며, 앞서 설명한 다양한 실시예들에 따라 도 5에 도시된 데이터가 송수신되는 과정이 변경될 수 있음은 기술분야에 속하는 당업자에게 자명하다.5 is a diagram illustrating a process in which data is transmitted/received between components included in the system for providing multi-object recognition service for companion animals using the camera of FIG. 1 according to an embodiment of the present invention. Hereinafter, an example of a process in which data is transmitted and received between each component will be described with reference to FIG. 5, but the present application is not limited to such an embodiment, and the example shown in FIG. 5 according to various embodiments described above will be described. It is apparent to those skilled in the art that the data transmission/reception process may be changed.

도 4를 참조하면, 인지 서비스 제공 서버는, 사용자 단말로부터 사용자 단말 및 카메라를 매핑하여 등록하고(S5100), 카메라로부터 수신된 영상 데이터 내 포함된 적어도 하나의 반려동물에 대응하는 다중객체를 식별 및 추적한다(S5200).Referring to FIG. 4 , the cognitive service providing server maps and registers the user terminal and the camera from the user terminal ( S5100 ), and identifies multiple objects corresponding to at least one companion animal included in the image data received from the camera, and Track (S5200).

그리고, 인지 서비스 제공 서버는, 식별 및 추적한 결과 데이터를 영상 데이터와 함께 전송한다(S5300).Then, the cognitive service providing server transmits the identification and tracking result data together with the image data (S5300).

상술한 단계들(S5100~S5300)간의 순서는 예시일 뿐, 이에 한정되지 않는다. 즉, 상술한 단계들(S5100~S5300)간의 순서는 상호 변동될 수 있으며, 이중 일부 단계들은 동시에 실행되거나 삭제될 수도 있다.The order between the above-described steps ( S5100 to S5300 ) is merely an example and is not limited thereto. That is, the order between the above-described steps ( S5100 to S5300 ) may be mutually changed, and some of these steps may be simultaneously executed or deleted.

이와 같은 도 5의 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1 내지 도 4를 통해 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters not described for the method for providing multi-object recognition service for companion animals using the camera of FIG. 5 are the same as those described for the method for providing multi-object recognition service for companion animals using the camera through FIGS. 1 to 4 above, or Since it can be easily inferred from the described content, the following description will be omitted.

도 5를 통해 설명된 일 실시예에 따른 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 방법은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The method of providing a companion animal multi-object recognition service using a camera according to an embodiment described with reference to FIG. 5 is also in the form of a recording medium including instructions executable by a computer, such as an application or program module executed by a computer. can be implemented. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 일 실시예에 따른 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있음)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 본 발명의 일 실시예에 따른 카메라를 이용한 반려동물 다중객체 인지 서비스 제공 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다.In the method for providing a multi-object recognition service for companion animals using a camera according to an embodiment of the present invention described above, an application basically installed in a terminal (which may include a program included in a platform or an operating system, etc. basically installed in the terminal) may be executed by the application store server, an application or an application (ie, a program) directly installed in the master terminal through an application providing server such as a web server related to the application or the corresponding service by the user. In this sense, the method for providing multi-object recognition service for companion animals using a camera according to an embodiment of the present invention described above is implemented as an application (ie, a program) installed by default in a terminal or directly installed by a user, and is installed in the terminal. It may be recorded on a computer-readable recording medium.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

Claims

delete

a camera installed or provided to face at least one companion animal in a space direction in which the companion animal lives to photograph at least one companion animal, and outputting image data of at least one companion animal;
a user terminal for outputting data as a result of recognizing the at least one companion animal when outputting the image data collected from the camera; and
A registration unit for mapping and registering the user terminal and the camera from the user terminal, a recognition unit for identifying and tracking multiple objects corresponding to the at least one companion animal included in the image data received from the camera, the identification and tracking a cognitive service providing server including a transmission unit for transmitting the result data together with the image data;
including,
The recognition unit,
Identifies and tracks the at least one companion animal based on facial data including facial expression data, body data, and pattern data including color or pattern of the at least one companion animal,
Between at least one component forming the face to identify the multi-object through nose print recognition of the at least one companion animal, or draw an outline by extracting edge points from the facial data of the at least one companion animal Connect edge points, identify multi-objects based on the connecting line distance between each component,
The cognitive service providing server,
a tagging unit for transmitting a tag corresponding to the facial expression data extracted from the facial data and the behavior pattern data extracted from the body data to the user terminal;
an IoT controller for performing a control process previously mapped to and stored in the abnormal behavior when an abnormal behavior is detected as a result of analyzing the image data collected from the camera;
further comprising,
The IoT control unit controls turn-on or turn-off of at least one IoT device that is linked through a home network to which the camera is connected,
The image data is a companion animal multi-object recognition service providing system using a camera, characterized in that it includes sound data.

delete