KR20230029444A

KR20230029444A - Platform apparatus for directly generating a cooking diary from cooking video obtained from a camer of a range hood or cooking device using deep learning technology

Info

Publication number: KR20230029444A
Application number: KR1020210111946A
Authority: KR
Inventors: 이승호
Original assignee: 주식회사 에이시티
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2023-03-03
Also published as: KR102615248B1

Abstract

The present invention relates to a platform device for generating a cooking diary from a cooking video using deep learning technology. According to the present invention, the platform device for generating a cooking diary comprises: at least one camera for photographing a cooking video in which a chef cooks while explaining a cooking process; a server for receiving the cooking video transmitted from the at least one camera and generating a cooking diary; and a cooking information DB for storing cooking-related raw materials, cooking methods, cooking utensils, cooking energy information, and text data or voice data related to cooking.

Description

Platform apparatus for directly generating a cooking diary from cooking video obtained from a camer of a range hood or cooking using deep learning technology device using deep learning technology}

본 발명은 딥러닝 기술을 이용하여 쿠킹 동영상으로부터 쿠킹 다이어리를 생성하기 위한 플랫폼장치에 관한 것으로, 보다 구체적으로는, 음성인식 및 딥러닝 기술을 통해 쿠킹 동영상을 이용하여 간단하게 쿠킹 다이어리를 생성하는 것이 가능한 딥러닝 기술을 이용하여 쿠킹 동영상으로부터 쿠킹 다이어리를 생성하기 위한 플랫폼장치에 관한 것이다.The present invention relates to a platform device for generating a cooking diary from a cooking video using deep learning technology, and more specifically, to simply generating a cooking diary using a cooking video through voice recognition and deep learning technology. It relates to a platform device for generating a cooking diary from a cooking video using possible deep learning technology.

일반적으로, 요리사나 요리를 좋아하는 사람들은 쿠킹 다이어리를 제작하여 사용한다. 모든 요리를 기억하는 것이 어렵기 때문에, 요리과정과 레시피를 요약한 쿠킹 다이어리를 제작하여 사용하고 있다.Generally, cooks or people who like to cook make and use a cooking diary. Since it is difficult to remember all the dishes, I have created and used a cooking diary that summarizes the cooking process and recipes.

종래의 경우에는 요리사나 기타 요리를 좋아하는 사람들이 TV나 SNS 또는 실습현장에서 공개되는 요리과정과 레시피를 노트에 수기로 기재하고, 사진을 찍어 노트에 붙이는 방식으로 쿠킹다이어리가 제작되어 왔다. In the past, cooking diaries have been produced in such a way that chefs or other cooking enthusiasts handwritten cooking processes and recipes disclosed on TV, SNS, or practice sites in notebooks, and take pictures and attach them to notebooks.

그러나 이러한 종래의 방식은, 쿠킹다이어리를 만드는 시간이 오래걸리고, 순간적으로 지나가는 음성을 바로 노트에 필기하는 것이 어렵고, 영상을 캡처하는 것도 어렵다는 문제점이 있어 제작의 난이도가 높다는 문제점이 있으며, 한번 제작한 경우에는 수정이나 추가가 어렵다는 문제점이 있었다. However, this conventional method has problems in that it takes a long time to make a cooking diary, it is difficult to write down instantaneously passing voices in a note, and it is difficult to capture an image, so the difficulty of production is high. In some cases, there was a problem that it was difficult to modify or add.

따라서, 손쉽고 간편하게 쿠킹다이어리를 생성할 수 있고 쿠킹 다이어리의 문자나 이미지의 수정이나 추가를 쉽게 할 수 있도록 하는 것이 가능한 쿠킹 다이어리의 생성에 대한 요구가 있어왔다.Accordingly, there has been a demand for the creation of a cooking diary capable of easily and simply creating a cooking diary and enabling easy modification or addition of characters or images in the cooking diary.

대한민국 공개특허공보 제10-2021-0096956호(2021.08.06.)Republic of Korea Patent Publication No. 10-2021-0096956 (2021.08.06.)

따라서, 본 발명의 목적은 상기한 종래의 문제점을 극복할 수 있는 딥러닝 기술을 이용하여 쿠킹 동영상으로부터 쿠킹 다이어리를 생성하기 위한 플랫폼장치를 제공하는 데 있다.Accordingly, an object of the present invention is to provide a platform device for generating a cooking diary from a cooking video using deep learning technology that can overcome the above conventional problems.

본 발명의 다른 목적은 쿠킹 동영상을 촬영하는 동작만으로 쿠킹 다이어리를 간편하게 만들 수 있고, 쿠킹 다이어리의 문자나 이미지의 수정이나 추가를 쉽게 할 수 있는 딥러닝 기술을 이용하여 쿠킹 동영상으로부터 쿠킹 다이어리를 생성하기 위한 플랫폼장치를 제공하는 데 있다.Another object of the present invention is to create a cooking diary from a cooking video using a deep learning technology that can easily create a cooking diary only by shooting a cooking video and easily modify or add text or images in the cooking diary. It is to provide a platform device for

상기한 기술적 과제들의 일부를 달성하기 위한 본 발명의 구체화에 따라, 본 발명에 따른 쿠킹 다이어리를 생성하기 위한 플랫폼장치는, 요리사가 요리과정을 설명하면서 요리를 하는 쿠킹 동영상을 촬영하기 위한 적어도 하나의 카메라와; 상기 적어도 하나의 카메라로부터 전송되는 쿠킹 동영상을 수신하여 쿠킹 다이어리를 생성하는 서버와; 쿠킹관련 원재료, 요리방법, 조리도구, 조리에너지정보 및 요리 관련 문자데이터나 음성데이터를 저장하는 쿠킹정보 DB를 구비하고, 상기 서버는, 상기 적어도 하나의 카메라로부터 전송되는 쿠킹 동영상을 수신하는 동영상 수신부와; 수신된 쿠킹 동영상을 타임 레이블을 부여하여 음성데이터와 이미지데이터로 분리하는 동영상 분리부와; STT(Speech to Text) 모듈을 이용하여 상기 음성데이터를 인식하여 문자(text)데이터로 변환함과 동시에 복수의 단계로 구분되는 요리단계를 인식하여, 각각의 요리단계 내에서 인식되는 단계별 문자데이터를, 대응되는 단계별 대표이미지의 하단에 결합시키도록 제공하는 음성데이터 처리부와; 상기 음성데이터 처리부에서 인식되는 요리단계별로, 해당 요리단계의 인식시점의 이미지를 상기 단계별 대표이미지로 추출하여 특정하고, 추출된 단계별 대표이미지를 미리 정해진 위치에 입력하고, 상기 음성데이터 처리부에서 제공되는 단계별 문자데이터를 상기 단계별 대표이미지의 하단에 결합시키는 이미지데이터 처리부를 구비한다.According to the embodiment of the present invention for achieving some of the above technical problems, a platform device for creating a cooking diary according to the present invention includes at least one device for taking a cooking video in which a chef cooks while explaining a cooking process. with a camera; a server for receiving the cooking video transmitted from the at least one camera and generating a cooking diary; A cooking information DB for storing cooking-related raw materials, cooking methods, cooking utensils, cooking energy information, and text data or audio data related to cooking is provided, and the server includes a video receiving unit to receive a cooking video transmitted from the at least one camera. and; a video separator for dividing the received cooking video into audio data and image data by assigning a time label; Using STT (Speech to Text) module, the voice data is recognized and converted into text data, and at the same time, cooking steps divided into a plurality of steps are recognized, and text data for each step recognized within each cooking step is generated. , a voice data processing unit that provides to be combined with the bottom of the corresponding step-by-step representative image; For each cooking step recognized by the voice data processing unit, the image at the time of recognition of the corresponding cooking step is extracted and specified as a representative image for each stage, the extracted representative image for each stage is input to a predetermined location, and provided by the voice data processing unit An image data processing unit is provided to combine the character data of each step with the bottom of the representative image for each step.

상기 쿠킹 동영상은 요리단계별로 단계 구분을 위한 특정 용어를 요리사가 발음하도록 하면서 촬영하고, 상기 음성데이터 처리부는 상기 특정 용어가 인식되는 시점을 각각의 요리단계가 시작되는 시점으로 인식할 수 있다.The cooking video is filmed while having a chef pronounce a specific term for classifying each cooking step, and the audio data processing unit may recognize the time when the specific term is recognized as the start time of each cooking step.

상기 음성데이터 처리부는, 상기 음성데이터의 음성 인식시에, 음성 인식률이 낮은 용어는 상기 쿠킹정보 DB에 저장된 용어와 유사도를 비교하여 가장 유사도가 높은 용어를 선택하여 표시하고, 상기 쿠킹정보 DB에 저장된 용어와 유사도가 없거나 일정수준이상 낮은 유사도를 가지는 경우에는 신규 용어로 인식하여 상기 쿠킹정보 DB에 저장되도록 할 수 있다.When recognizing the voice data, the voice data processing unit compares terms with a low voice recognition rate with terms stored in the cooking information DB, selects and displays a term having the highest similarity, and displays the terms stored in the cooking information DB. If there is no similarity with a term or if it has a similarity lower than a certain level, it can be recognized as a new term and stored in the cooking information DB.

상기 음성데이터 처리부는, STT(Speech to Text) 모듈을 이용하여 상기 음성데이터를 인식하여 문자데이터로 변환하기 위한 STT 변환부와; 상기 STT변환부에서 인식되는 특정용어를 통해 요리단계를 인식하는 요리 단계 인식부와; 상기 단계별 대표이미지의 하단에 결합시키도록, 각각의 요리단계 내에서 인식되는 단계별 문자데이터를, 각각의 요리단계 인식시점의 타임레이블 정보와 함께 상기 이미지데이터 처리부에 제공하는 단계별 문자데이터 생성부를 구비할 수 있다.The voice data processing unit includes an STT conversion unit for recognizing and converting the voice data into text data using a speech to text (STT) module; a cooking step recognizing unit recognizing a cooking stage through a specific term recognized by the STT conversion unit; A step-by-step text data generation unit providing step-by-step text data recognized in each cooking step together with time label information at the time of recognition of each cooking step to the image data processing unit so as to be combined with the bottom of the step-by-step representative image. can

상기 이미지데이터 처리부는, 상기 음성데이터 처리부에서 제공되는 타임레이블 정보에 따라 해당 요리단계의 인식시점의 이미지를 상기 단계별 대표이미지로 추출하여 특정하는 대표이미지 추출부와; 추출된 상기 단계별 대표이미지를 미리 정해진 위치에 입력하는 이미지데이터 가공부와; 미리 정해진 위치에 입력된 상기 단계별 대표이미지의 하단에, 상기 음성데이터 처리부에서 제공되는 상기 단계별 문자데이터를 상기 단계별 대표이미지의 하단에 결합시켜 단계별 요리단계 이미지를 생성하는 요리정보 조합부와; 상기 단계별 요리단계 이미지를 타임 레이블 순서에 따라 조합하여 하나의 요리에 대한 쿠킹 다이어리를 생성하는 다이어리 생성부를 구비할 수 있다.The image data processor includes: a representative image extractor for extracting and specifying an image at a time of recognition of a corresponding cooking step as a representative image for each step according to time label information provided by the audio data processor; an image data processing unit inputting the extracted representative image for each step to a predetermined location; a cooking information combining unit for generating a cooking step image by combining the step-by-step text data provided by the voice data processing unit with the bottom of the step-by-step representative image input at a predetermined location; A diary generating unit may be provided to generate a cooking diary for one dish by combining the step-by-step cooking step images according to the order of time labels.

본 발명에 따르면, 쿠킹 동영상을 촬영하는 동작만으로 쿠킹다이어리를 간편하게 만들 수 있다는 장점이 있다. 또한 쿠킹 다이어리의 문자나 이미지의 수정이나 추가를 쉽게 할 수 있다는 장점이 있다. According to the present invention, there is an advantage in that a cooking diary can be easily created only by an operation of photographing a cooking video. In addition, it has the advantage of being able to easily modify or add characters or images in the cooking diary.

도 1은 본 발명의 일 실시예에 따른 쿠킹 다이어리를 생성하기 위한 플랫폼장치의 개략적 블록도이고,
도 2는 도 1의 서버의 개략적 블록도이고,
도 3은 도 2의 음성데이터 처리부의 개략적 블록도이고,
도 4는 도 2의 이미지데이터 처리부의 개략적 블록도이고,
도 5는 도 1의 플랫폼 장치의 동작순서도이고,
도 6 내지 도 9는 각 단계별 요리단계이미지를 나타낸 것이다. 1 is a schematic block diagram of a platform device for generating a cooking diary according to an embodiment of the present invention;
2 is a schematic block diagram of the server of FIG. 1;
3 is a schematic block diagram of the audio data processing unit of FIG. 2;
Figure 4 is a schematic block diagram of the image data processing unit of Figure 2,
5 is an operation flowchart of the platform device of FIG. 1;
6 to 9 show images of each cooking step.

이하에서는 본 발명의 바람직한 실시예가, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 철저한 이해를 제공할 의도 외에는 다른 의도 없이, 첨부한 도면들을 참조로 하여 상세히 설명될 것이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, without any intention other than to provide a thorough understanding of the present invention to those skilled in the art to which the present invention pertains.

도 1은 본 발명의 일 실시예에 따른 쿠킹 다이어리를 생성하기 위한 플랫폼장치의 개략적 블록도이고, 도 2는 도 1의 서버의 개략적 블록도이고, 도 3은 도 2의 음성데이터 처리부의 개략적 블록도이고, 도 4는 도 2의 이미지데이터 처리부의 개략적 블록도이다.1 is a schematic block diagram of a platform device for generating a cooking diary according to an embodiment of the present invention, FIG. 2 is a schematic block diagram of a server in FIG. 1, and FIG. 3 is a schematic block diagram of a voice data processing unit in FIG. , and FIG. 4 is a schematic block diagram of the image data processing unit of FIG. 2 .

도 1 내지 도 4에 도시된 바와 같이, 본 발명의 일 실시예에 따른 쿠킹 다이어리를 생성하기 위한 플랫폼장치(500)는 서버(100), 적어도 하나의 카메라(200) 및 쿠킹정보DB(300)를 구비한다.1 to 4, the platform device 500 for generating a cooking diary according to an embodiment of the present invention includes a server 100, at least one camera 200 and a cooking information DB 300 to provide

상기 적어도 하나의 카메라(200)는 요리사가 요리단계별로 요리과정을 설명하면서 요리를 하는 쿠킹 동영상을 촬영하기 위한 것으로, 하나 또는 복수로 구비될 수 있다. 상기 적어도 하나의 카메라(200)는 복수로 구비되는 경우에, 주방의 레인지 후드와 요리 테이블 주위에 각각 설치되며, 요리하는 동선을 따라 위치에 따라 메인 카메라와 서브 카메라가 서로 역할을 교체하면서 촬영하여 하나의 쿠킹동영상을 촬영하는 것이 가능하다. 상기 적어도 하나의 카메라(200)는 일반카메라도 가능하지만, 스마트폰에 장착된 카메라가 포함될 수 있다.The at least one camera 200 is used to record a cooking video while the chef explains the cooking process step by step, and may be provided one or more times. When the at least one camera 200 is provided in plurality, it is respectively installed around the range hood and the cooking table in the kitchen, and the main camera and the sub camera take pictures while exchanging roles depending on the location along the cooking line. It is possible to shoot a single cooking video. The at least one camera 200 may be a general camera, but may include a camera mounted on a smartphone.

상기 적어도 하나의 카메라(200)는 렌지 후드의 하부, 요리사의 머리, 요리테이블 등에 설치될 수 있으며, 요리사의 조작 또는 서버(100)의 제어에 의해 이들 중 하나가 메인 카메라가 되고, 나머지는 서브 카메라로 조절되도록 할 수 있다.The at least one camera 200 may be installed at the lower part of the range hood, the head of the chef, or the cooking table, and one of them becomes the main camera by the chef's operation or the control of the server 100, and the others become the sub-cameras. It can be controlled by the camera.

상기 쿠킹정보 DB(300)는 쿠킹관련 원재료, 요리방법, 조리도구, 조리에너지정보 및 요리 관련 문자데이터나 음성데이터를 저장한다.The cooking information DB 300 stores cooking-related raw materials, cooking methods, cooking utensils, cooking energy information, and cooking-related text data or voice data.

상기 쿠킹정보 DB(300)는 일반적으로 요리사가 요리를 설명하는 경우에 발음될 수 있는 요리재료나, 요리방법, 조리도구, 조리에너지(화구) 정보 등 요리를 수행하면서 발음될 수 있는 모든 음성데이터정보 및 이에 대응되는 문자데이터 정보를 저장하고 있다. 또한, 상기 쿠킹정보DB(300)는 요리에 관한 전자책의 데이터를 추가할 수도 있고, 스크롤 방식으로 인터넷, SNS 상에 공개된 다양한 요리 레시피를 수집하여 머신 러닝 알고리즘을 이용하여 분류된 카테코리에 데이터로 저장하는 것이 가능하다. The cooking information DB 300 generally includes all voice data that can be pronounced while cooking, such as cooking ingredients, cooking methods, cooking utensils, and cooking energy (crater) information that can be pronounced when a chef explains cooking. Information and text data information corresponding thereto are stored. In addition, the cooking information DB 300 may add data of e-books related to cooking, collect various cooking recipes published on the Internet and SNS in a scrolling manner, and classify the data into categories using a machine learning algorithm It is possible to save as

상기 서버(100)는 상기 적어도 하나의 카메라(200)로부터 전송되는 쿠킹 동영상을 수신하여 이를 가공하여 쿠킹 다이어리를 생성하게 된다. The server 100 receives the cooking video transmitted from the at least one camera 200 and processes it to create a cooking diary.

상기 서버(100)는 동영상 수신부(110), 동영상 분리부(120), 음성데이터 처리부(130), 및 이미지 데이터 처리부(140)를 구비한다. The server 100 includes a video receiver 110, a video separator 120, an audio data processor 130, and an image data processor 140.

상기 동영상 수신부(110)는 상기 적어도 하나의 카메라(200)로부터 전송되는 쿠킹 동영상을 수신한다. 상기 쿠킹 동영상은 요리사가 요리단계별로 요리를 설명하면서 요리를 하는 과정을 촬영한 동영상을 의미할 수 있다.The video receiver 110 receives the cooking video transmitted from the at least one camera 200 . The cooking video may refer to a video in which a cooking process is photographed while a chef explains the cooking step by step.

상기 동영상 분리부(120)는 상기 동영상 수신부(110)를 통해 수신되는 쿠킹 동영상을 타임 레이블(time label)을 부여하여 음성데이터와 이미지데이터로 분리한다.The video separator 120 separates the cooking video received through the video receiver 110 into audio data and image data by giving a time label.

상기 음성데이터 처리부(130)는 STT(Speech to Text) 모듈을 이용하여 상기 음성데이터를 인식하여 문자(text)데이터로 변환한다. 이와 동시에 복수의 단계로 구분되는 요리단계를 인식하여, 각각의 요리단계 내에서 인식되는 단계별 문자데이터를, 대응되는 단계별 대표이미지의 하단에 결합시키도록 상기 이미지 데이터 처리부(140)에 제공한다.The voice data processing unit 130 recognizes the voice data using a speech to text (STT) module and converts the voice data into text data. At the same time, cooking steps divided into a plurality of steps are recognized, and text data for each step recognized within each cooking step is provided to the image data processing unit 140 to be combined with the bottom of the corresponding step-by-step representative image.

상기 음성데이터 처리부(130)는, STT변환부(132), 요리단계인식부(134) 및 단계별 문자데이터 생성부(136)를 구비한다.The voice data processing unit 130 includes an STT conversion unit 132, a cooking step recognition unit 134, and a step-by-step text data generation unit 136.

상기 STT변환부(132)는 STT(Speech to Text) 모듈을 이용하여 상기 음성데이터를 인식하여 문자데이터로 변환한다.The STT conversion unit 132 recognizes the voice data and converts it into text data using a speech to text (STT) module.

상기 STT변환부(132)는 딥러닝을 통한 학습을 통해 요리사가 요리를 설명하는 경우에 발음될 수 있는 요리재료나, 요리방법, 조리도구, 조리에너지(화구) 정보 등 요리를 수행하면서 발음될 수 있는 모든 음성 데이터정보 및 이에 대응되는 문자데이터 정보를 학습하여 음성데이터의 음성을 인식하게 된다. 또한, 상기 음성데이터의 음성 인식시에, 음성 인식률이 낮은 용어는 상기 쿠킹정보 DB(300)에 저장된 용어와 유사도를 비교하여 가장 유사도가 높은 용어를 선택하여 표시하고, 상기 쿠킹정보 DB(300)에 저장된 용어와 유사도가 없거나 일정수준이상 낮은 유사도를 가지는 경우에는 신규 용어로 인식하여 머신러닝 알고리즘에 따라 상기 쿠킹정보 DB(300)에 저장되도록 한다.The STT conversion unit 132 can be pronounced while cooking, such as cooking ingredients, cooking methods, cooking utensils, and cooking energy (cooker) information that can be pronounced when a chef explains cooking through learning through deep learning. All possible voice data information and corresponding text data information are learned to recognize the voice of the voice data. In addition, when recognizing the voice data, a term having a low voice recognition rate is compared with terms stored in the cooking information DB 300 for similarity, and a term having the highest similarity is selected and displayed, and the cooking information DB 300 If there is no similarity with the term stored in , or if it has a similarity lower than a certain level, it is recognized as a new term and stored in the cooking information DB 300 according to a machine learning algorithm.

상기 요리단계 인식부(134)는 상기 STT변환부에서 인식되는 특정용어를 통해 요리단계를 인식하게 된다. 즉 상기 특정 용어가 인식되는 시점을 각각의 요리단계가 시작되는 시점으로 인식하게 된다. 상기 요리단계 인식부(134)에서 요리단계를 인식하도록 하기 위하여, 상기 쿠킹 동영상의 촬영시에는 요리준비단계 및 요리과정에서 각각의 요리단계별로 단계 구분을 위한 특정 용어를 요리사가 발음하도록 하면서 촬영하게 된다. 예를 들면, 요리사가 " 요리준비" 또는 "요리를 시작하겠습니다"라는 단어를 사용하게 되면, 이를 요리준비단계로 인식하는 것이 가능하고, 요리사가 "1단계, 2단계....등" 또는 "첫번째, 두 번째....등 " 또는 " 1차, 2차... 등"과 같은 단어를 사용하는 경우 이를 상기 STT변환부(132)에서 문자로 변환하면, 변환된 문자를 통해 각각의 요리단계를 인식하는 것이 가능하다. 따라서 문자(텍스트)로 변환된 문자데이터 중에서 '제1단계'와 같은 키워드가 포함된 문장의 경우에는 제1단계의 요리 방법으로 판단할 수 있게 되는 것이다. The cooking step recognition unit 134 recognizes the cooking step through a specific term recognized by the STT conversion unit. That is, the time point at which the specific term is recognized is recognized as the time point at which each cooking step starts. In order for the cooking step recognizing unit 134 to recognize the cooking step, the cooking video is filmed while having the chef pronounce specific terms for classifying each cooking step in the cooking preparation step and cooking process. do. For example, if a cook uses the words "prepare" or "I'm going to start cooking", it is possible to recognize this as a preparation step, and the cook may say "step 1, step 2...etc" or When words such as “first, second…, etc.” or “primary, second…, etc.” are converted into text in the STT conversion unit 132, each of the words is converted through the converted text. It is possible to recognize the cooking stages of Therefore, in the case of a sentence including a keyword such as 'first step' among text data converted to text (text), it can be judged as a first step cooking method.

상기 단계별 문자데이터 생성부(136)는 상기 단계별 대표이미지의 하단에 결합시키도록, 각각의 요리단계 내에서 인식되는 단계별 문자데이터를 상기 이미지데이터 처리부(140)에 제공하게 된다.The step-by-step text data generation unit 136 provides the image data processor 140 with step-by-step text data recognized in each cooking step to be combined with the bottom of the step-by-step representative image.

상기 단계별 문자데이터 생성부(136)는, 예를 들면, 요리준비단계의 경우, 요리사가 "요리를 시작하겠습니다"의 발음이후 "요리의 명칭은 '새우찬묵'이고, 이 요리에 쓰이는 재료(원료)는 '새우, 달걀, 완두, 감자, 당근, 레몬'입니다"와 같이 설명(발음)하게 되면, "요리를 시작하겠습니다"의 인식을 통해 요리준비단계를 알 수 있고, 이후에 설명되는 명칭과 재료의 인식을 통해 요리준비단계의 문자데이터를 생성하는 것이 가능하다. The step-by-step character data generation unit 136, for example, in the cooking preparation step, after the chef pronounces "I will start cooking", "The name of the dish is 'Shrimp Chanmuk', and the ingredients used in this dish (raw material) ) is described (pronounced) as 'shrimp, egg, pea, potato, carrot, lemon'", you can know the cooking preparation stage through the recognition of "I will start cooking", and the name and It is possible to generate character data in the cooking preparation step through the recognition of ingredients.

일반적으로, 최초의 요리 재료 영상은 요리사가 요리에 필요한 재료를 테이블 위에 정렬시켜 놓고 "이 요리에 쓰이는 재료~"는 이라는 문장으로 시작하여 각각의 재료를 설명하는데 이 설명에 따른 문자(텍스트)를 요리의 원재료로 발췌하여 요리준비단계의 단계별 문자데이터로 생성하게 된다. 이렇게 생성된 단계별 문자데이터는 "이 요리에 쓰이는 재료~"의 설명이 시작되는 시점의 타임레이블(Time lable) 정보와 함께 상기 이미지 데이터 처리부(140)로 전송하게 된다. 이 경우 요리사가 자신이 하는 요리의 레시피에 해당되는 정보가 상기 쿠킹정보 DB(300)에 저장되어 있으면 음성인식이 용이할 수 있다. In general, in the first video of cooking ingredients, the chef arranges the ingredients necessary for cooking on the table, starts with the sentence "Ingredients used for this dish~", and explains each ingredient. It is extracted as the raw material of the dish and generated as character data for each stage of the cooking preparation step. The generated text data for each step is transmitted to the image data processing unit 140 together with time label information at the point at which the description of "the ingredients used for this dish" starts. In this case, if information corresponding to a recipe of a dish cooked by the chef is stored in the cooking information DB 300, voice recognition may be facilitated.

그리고, "1단계~" 이후에 포함된 문장이 "젤라틴은 물에 20분간 불린 뒤 약불로 5분 가량 끓여 녹인다"가 포함되어 있는 경우, 이를 1단계의 요리방법에 대한 설명으로 인식하여 이를 제1단계의 단계별 문자데이터로 생성하게 된다. 이때 '1단계'라는 문자(텍스트)가 위치한 타임 레이블의 시간이 특정되고, 이 타임 레이블 정보가 상기 이미지데이터 처리부로 단계별 문자데이터와 함께 또는 시간을 달리하여 전송된다. 즉 타임레이블 정보가 먼저 전송되고 상기 단계별 문자데이터가 후속으로 전송되는 것도 가능하고 동시에 전송하는 것도 가능하다.In addition, if the sentence included after "Step 1~" includes "Soak gelatin in water for 20 minutes, boil over low heat for 5 minutes to dissolve", this is recognized as an explanation of the cooking method in Step 1 and removed. It is created as character data for each step of step 1. At this time, the time of the time label where the character (text) of 'Step 1' is located is specified, and this time label information is transmitted to the image data processing unit together with the character data for each step or at different times. That is, the time label information may be transmitted first, and the text data for each step may be transmitted subsequently or simultaneously.

상기 이미지 데이터 처리부(140)는 상기 음성데이터 처리부(130)에서 인식되는 요리단계별로, 해당 요리단계의 인식시점의 이미지를 상기 단계별 대표이미지로 추출하여 특정하고, 추출된 단계별 대표이미지를 미리 정해진 위치에 입력하고, 상기 음성데이터 처리부(130)에서 제공되는 단계별 문자데이터를 상기 단계별 대표이미지의 하단에 결합시켜 쿠킹다이어리를 생성하게 된다.For each cooking step recognized by the voice data processor 130, the image data processing unit 140 extracts and specifies the image at the time of recognition of the corresponding cooking step as a representative image for each stage, and places the extracted representative image for each stage at a predetermined location. , and a cooking diary is created by combining the step-by-step text data provided by the voice data processing unit 130 with the bottom of the step-by-step representative image.

상기 이미지데이터 처리부(140)는, 대표이미지 추출부(142), 이미지 데이터 가공부(144), 요리정보 조합부(146) 및 다이어리 생성부(148)를 구비한다.The image data processing unit 140 includes a representative image extraction unit 142, an image data processing unit 144, a cooking information combination unit 146, and a diary creation unit 148.

상기 대표이미지 추출부(142)는 상기 음성데이터 처리부(130)의 상기 단계별 문자데이터 생성부(136)에서 제공되는 타임레이블 정보에 따라 해당 요리단계의 인식시점의 이미지를 상기 단계별 대표이미지로 추출하여 특정하게 된다.The representative image extractor 142 extracts the image at the time of recognition of the corresponding cooking step as the representative image for each step according to the time label information provided from the step-by-step text data generator 136 of the voice data processor 130. to be specific

예를 들면, 요리 1단계의 경우 상기 대표이미지 추출부(142)는 상기 동영상 분리부(120)를 통해 분리된 이미지 데이터 중에서, 상기 단계별 문자데이터 생성부(136)를 통해 수신한 타임 레이블에 해당되는 시점, 즉 "제1단계~" 발음시점의 동영상 이미지를 캡쳐하여 1단계 요리의 대표이미지로 추출한다.For example, in the case of step 1 of cooking, the representative image extractor 142 corresponds to the time label received through the step-by-step character data generator 136 among the image data separated through the video separator 120. The video image at the point of time, that is, the point of pronunciation of "first step ~" is captured and extracted as a representative image of the first step dish.

이러한 방식으로 각각의 단계별 대표이미지를 추출하는 것이 가능하다.In this way, it is possible to extract representative images for each stage.

상기 이미지 데이터 가공부(144)는 상기 대표이미지 추출부(142)를 통해 추출된 상기 단계별 대표이미지를 쿠킹다이어리 생성을 위해 미리 정해진 위치에 입력하게 된다.The image data processing unit 144 inputs the step-by-step representative image extracted through the representative image extraction unit 142 to a predetermined location to create a cooking diary.

상기 요리정보 조합부(146)는 상기 이미지 데이터 가공부(144)를 통해 미리 정해진 위치에 입력된 상기 단계별 대표이미지의 하단에, 상기 음성데이터 처리부(130)의 단계별 문자데이터 생성부(136)에서 제공되는 상기 단계별 문자데이터를 상기 단계별 대표이미지의 하단에 결합시켜 단계별 요리단계 이미지를 생성한다.The cooking information combination unit 146 is at the bottom of the step-by-step representative image input to a predetermined position through the image data processing unit 144, in the step-by-step text data generation unit 136 of the voice data processing unit 130. The provided step-by-step character data is combined with the bottom of the step-by-step representative image to create a step-by-step cooking step image.

상기 다이어리 생성부(148)는 상기 단계별 요리단계 이미지를 타임 레이블 순서에 따라 조합하여 하나의 요리에 대한 쿠킹 다이어리를 생성하게 된다.The diary creation unit 148 creates a cooking diary for one dish by combining the step-by-step cooking step images in the order of time labels.

이하 상기 쿠킹 다이어리를 생성하기 위한 플랫폼장치(500)의 동작을 도 5를 통해 살펴본다.Hereinafter, the operation of the platform device 500 for generating the cooking diary will be described with reference to FIG. 5 .

도 5는 상기 플랫폼 장치의 개략적 동작순서도를 나타낸 것이고, 도 6 내지 도 9는 상기 단계별 요리단계 이미지를 나타낸 것이다.5 shows a schematic operation flow chart of the platform device, and FIGS. 6 to 9 show images of the step-by-step cooking steps.

도 5에 도시된 바와 같이, 상기 쿠킹 다이어리를 생성하기 위한 플랫폼장치(500)는 쿠킹다이어리 생성을 위해 적어도 하나의 카메라(200)를 이용하여 쿠킹동영상을 촬영하여 상기 서버(100)에 전송하게 된다(S110).As shown in FIG. 5, the platform device 500 for generating the cooking diary captures a cooking video using at least one camera 200 and transmits it to the server 100 to create the cooking diary. (S110).

상기 서버(100)에서는 수신된 쿠킹 동영상을 타임 레이블을 부여하여 음성데이터와 이미지데이터로 분리하게 된다(S112).The server 100 assigns a time label to the received cooking video and separates it into audio data and image data (S112).

분리된 음성데이터는 음성데이터 처리부(130)의 STT 변환모듈을 통해 문자(텍스트(Text))로 변환되며, 요리사가 발음하는 특정용어를 통해 요리단계를 인식하고, 각각의 요리단계 내에서 단계별 문자데이터를 생성하여 상기 이미지 처리부에 제공하게 된다(S114). The separated voice data is converted into text (text) through the STT conversion module of the voice data processing unit 130, and the cooking steps are recognized through specific terms pronounced by the chef, and the text for each step within each cooking step. Data is generated and provided to the image processing unit (S114).

한편 상기 이미지 데이터 처리부(140)는 상기 음성데이터 처리부(130)에서 인식되는 요리단계별로, 해당 요리단계의 인식시점의 이미지를 상기 단계별 대표이미지로 추출하여 단계별 대표이미지를 생성하게 된다(S116). On the other hand, the image data processor 140 extracts the image at the time of recognition of the corresponding cooking step for each cooking step recognized by the voice data processor 130 as a representative image for each step to generate a representative image for each step (S116).

상기 단계별 대표이미지는 상기 단계별 문자데이터 생성시점과 유사 또는 동일하게 생성되는 것도 가능하고, 상기 단계별 문자데이터 생성이후에 생성되는 것도 가능하다. 이는 상기 음성데이터 처리부(130)에서 해당 요리단계 인식시점의 타임 레이블을 언제 전송하느냐에 따라 달라지는 것으로, 상기 타임레이블이 해당요리단계의 인식시점에 바로 전송되는 경우에는 상기 단계별 대표이미지는 상기 단계별 문자데이터 생성시점과 유사 또는 동일하게 생성될 수 있고, 상기 타임레이블이 상기 단계별 문자데이터과 함께 전송되는 경우에는 상기 단계별 대표이미지는 상기 단계별 문자데이터 생성이후에 생성될 것이다.The step-by-step representative image may be created similarly or identically to the step-by-step text data creation point, or may be created after the step-by-step text data is created. This depends on when the time label at the time of recognizing the corresponding cooking step is transmitted from the voice data processing unit 130. It may be created similarly or identically to the time of creation, and if the time label is transmitted together with the text data for each step, the representative image for each step will be created after the text data for each step is generated.

이후 상기 이미지 데이터 처리부(140)는 상기 단계별 대표이미지를 미리 정해진 위치에 입력하고, 상기 음성데이터 처리부(130)에서 제공되는 단계별 문자데이터를 상기 단계별 대표이미지의 하단에 결합시켜 도 6 내지 도 9에 도시된 바와 같은, 단계별 요리단계 이미지를 생성한다(S118). Thereafter, the image data processing unit 140 inputs the representative image for each stage to a predetermined position, and combines the text data for each stage provided by the voice data processing unit 130 with the bottom of the representative image for each stage to show FIGS. 6 to 9 As shown, a step-by-step cooking step image is generated (S118).

도 6은 요리준비단계의 요리단계 이미지를 나타낸 것으로, 상단에 요리의 명칭이 기재되고, 그 아래에 요리준비단계의 단계별 대표이미지가 입력되고, 단계별 대표이미지 하단에 요리준비단계의 단계별 문자데이터가 결합되어 있음을 알 수 있다. 6 shows the cooking step image of the cooking preparation step, the name of the dish is written at the top, the representative image for each step of the cooking preparation step is input below it, and the character data for each step of the cooking preparation step is displayed at the bottom of the representative image for each step. It can be seen that they are connected.

도 7은 요리 1단계의 요리단계이미지를 나타낸 것으로, 좌측상단에 요리 단계가 숫자로 표시되고, 1단계의 단계별 대표이미지가 위치되고, 그 하단에 1단계의 단계별 문자데이터가 결합되게 된다. 도 8은 요리 2단계의 요리단계이미지를 나타낸 것으로, 좌측상단에 요리 단계가 숫자로 표시되고, 2단계의 단계별 대표이미지가 위치되고, 그 하단에 2단계의 단계별 문자데이터가 결합되게 된다.FIG. 7 shows a cooking step image of the first step of cooking. In the upper left corner, the cooking step is displayed in numbers, a representative image for each step of the first step is located, and text data for each step of the first step is combined at the bottom. Figure 8 shows the cooking step image of the second step of cooking, the cooking step is displayed in numbers on the upper left, the representative image of the second step is located, and the text data of the second step is combined at the bottom.

도 9는 최종적으로 요리가 끝나는 요리종료단계의 요리단계 이미지를 나타낸 것으로, 요리사가 "이렇게 요리가 완성되었습니다"라는 특정용어(요리, 완성 등)의 음성인식을 요리종료시점을 판단하고 요리종료단계를 인식하게 되며, 이러한 문자가 인식하는 시점의 타임레이블의 이미지 데이터를 추출하여 요리 종료단계의 단계별 대표이미지를 추출하게 되고, 이때의 단계별 대표이미지와 단계별 문자데이터를 결합하여 도 9에 도시된 바와 같이, 요리종료단계의 요리단계이미지를 생성하게 된다. 9 shows an image of the cooking stage at the cooking end stage where cooking is finally completed, and the cook judges the cooking end point by voice recognition of a specific term (cooking, completion, etc.) saying “this dish is completed”, and the cooking end stage is recognized, and representative images for each stage of the cooking end stage are extracted by extracting the image data of the time label at the time when these characters are recognized, and by combining the representative images for each stage and the character data for each stage, Likewise, the cooking step image of the cooking end step is created.

이후 각 단계별 요리단계이미지를 타임 레이블 순서에 따라 조합하여 하나의 요리에 대한 쿠킹 다이어리를 생성하게 된다(S120). Thereafter, the cooking diary for one dish is created by combining the cooking step images according to the order of time labels (S120).

생성된 쿠킹 다이어리는 선택에 따라 각 단계별 요리단계이미지 내의 문자 또는 이미지 데이터를 변경하거나 추가할 수 있다. The created cooking diary may change or add text or image data in each stage of cooking step image according to selection.

상술한 바와 같이, 본 발명에 따르면, 요리사가 요리를 할때 동영상을 촬영하는 것 만으로, 간편하게 쿠킹 다이어리를 생성하도록 할 수 있다는 장점이 있다.As described above, according to the present invention, there is an advantage in that a cooking diary can be easily created by simply taking a video while a chef cooks.

상기한 실시예의 설명은 본 발명의 더욱 철저한 이해를 위하여 도면을 참조로 예를 든 것에 불과하므로, 본 발명을 한정하는 의미로 해석되어서는 안될 것이다. 또한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기본적 원리를 벗어나지 않는 범위 내에서 다양한 변화와 변경이 가능함은 명백하다 할 것이다. The description of the above embodiment is merely an example with reference to the drawings for a more thorough understanding of the present invention, and should not be construed as limiting the present invention. In addition, it will be apparent to those skilled in the art that various changes and modifications are possible within a range that does not deviate from the basic principles of the present invention.

100 : 서버 110 : 동영상 수신부
120 : 동영상 분리부 130 : 음성데이터 처리부
140 : 이미지데이터 처리부 200 : 카메라
300 : 쿠킹정보 DB 100: server 110: video receiver
120: video separation unit 130: audio data processing unit
140: image data processing unit 200: camera
300: Cooking information DB

Claims

at least one camera for recording a cooking video in which the chef explains the cooking process;
a server for receiving the cooking video transmitted from the at least one camera and generating a cooking diary;
A cooking information DB for storing cooking-related raw materials, cooking methods, cooking utensils, cooking energy information, and text data or voice data related to cooking,
The server,
a video receiver for receiving the cooking video transmitted from the at least one camera;
a video separator for dividing the received cooking video into audio data and image data by assigning a time label;
Using STT (Speech to Text) module, the voice data is recognized and converted into text data, and at the same time, cooking steps divided into a plurality of steps are recognized, and text data for each step recognized within each cooking step is generated. , a voice data processing unit that provides to be combined with the bottom of the corresponding step-by-step representative image;
For each cooking step recognized by the voice data processing unit, the image at the time of recognition of the corresponding cooking step is extracted and specified as a representative image for each stage, the extracted representative image for each stage is input to a predetermined location, and provided by the voice data processing unit A platform device for generating a cooking diary, characterized by comprising an image data processing unit for combining step-by-step text data with the bottom of the step-by-step representative image.

The method of claim 1,
The cooking video is filmed while the chef pronounces a specific term for classifying each cooking step, and the audio data processing unit recognizes the time when the specific term is recognized as the starting point of each cooking step. Characterized in that A platform device for creating a cooking diary.

The method of claim 1,
When recognizing the voice data, the voice data processing unit compares terms with a low voice recognition rate with terms stored in the cooking information DB, selects and displays a term having the highest similarity, and displays the terms stored in the cooking information DB. A platform device for creating a cooking diary, characterized in that when there is no similarity with a term or it has a low similarity above a certain level, it is recognized as a new term and stored in the cooking information DB.

The method of claim 2,
The voice data processing unit,
an STT conversion unit for recognizing the voice data and converting it into text data using a speech to text (STT) module;
a cooking step recognizing unit recognizing a cooking stage through a specific term recognized by the STT conversion unit;
Equipped with a step-by-step character data generation unit that provides step-by-step character data recognized in each cooking step together with time label information at the time of recognition of each cooking step to the image data processing unit so as to be combined with the bottom of the step-by-step representative image Platform device for generating a cooking diary characterized by.

The method of claim 4,
The image data processing unit,
a representative image extractor for extracting and specifying an image at the time of recognition of a corresponding cooking step as a representative image for each step according to the time label information provided by the voice data processor;
an image data processing unit inputting the extracted representative image for each step to a predetermined location;
a cooking information combining unit for generating a cooking step image by combining the step-by-step text data provided by the voice data processing unit with the bottom of the step-by-step representative image input at a predetermined location;
and a diary generator for generating a cooking diary for one dish by combining the step-by-step cooking step images according to the order of time labels.