KR102345666B1

KR102345666B1 - Conference video providing system using AI

Info

Publication number: KR102345666B1
Application number: KR1020200126082A
Authority: KR
Inventors: 장원철
Original assignee: 주식회사 어반컴플렉스
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2021-12-31

Abstract

The present invention relates to an unmanned conference video providing system using AI. The unmanned conference video providing system comprises: a photographing unit for photographing a meeting area in which a meeting is held by a plurality of participants; a location calculation unit for detecting the faces of the participants from the meeting image provided by the photographing unit, and determining the location of each participant in the meeting area based on the detected information; a speech determination unit for determining whether the participants in the meeting area speak; and a photography control unit for determining the location of a speaker among the participants based on the determination information provided by the speech determination unit and the location calculation unit, and controlling the photographing unit so that a part of the meeting area where the speaker is located can be enlarged and photographed. The unmanned conference video providing system can detect the faces of the participants from the meeting image obtained by photographing the meeting, enlarge the speaker speaking in the meeting among the participants, and photograph the speaker, thereby providing the meeting image capable of accurately delivering the situation and contents of the meeting.

Description

Unmanned conference video providing system using AI

본 발명은 AI를 이용한 무인 회의 영상 제공 시스템에 관한 것으로서, 더욱 상세하게는 회의를 촬영하는 카메라를 이용하여 발언하는 발언자를 식별하여 해당 발언자를 확대 촬영할 수 있는 AI를 이용한 무인 회의 영상 제공 시스템에 관한 것이다. The present invention relates to an unmanned conference video providing system using AI, and more particularly, to an unmanned conference video providing system using AI capable of magnifying a speaker by identifying a speaker using a camera recording a meeting will be.

다자간 회의를 하는 경우 참여중인 유저에게 회의 내용을 효율적으로 전달하기 위해 모니터 또는 프로젝터를 이용해 프레젠테이션 데이터를 화면에 출력시켜 회의를 진행하게 된다.In the case of a multi-party conference, the presentation data is output on the screen using a monitor or projector to efficiently deliver the conference contents to the participating users.

그러나, 원격 회의나 회의 내용을 녹화하는 경우 회의실에 설치된 카메라에 의해 촬영된 영상을 전송하거나 녹화할 수 있지만 발표나 발언을 하는 발언자를 선별하여 촬영할 수 없기 때문에 원격지에서 회의에 참석하는 경우 발언자를 특정하기 어려우므로 효율적인 회의 전달이 불가능한 문제가 있었다.However, when recording the contents of a remote meeting or meeting, the video captured by the camera installed in the conference room can be transmitted or recorded, but the speaker who is speaking or speaking cannot be selected and recorded. It was difficult to do so, so there was a problem in that it was impossible to efficiently deliver the meeting.

또한, 회의를 녹화하는 경우에도 회의실 전체 영상만 녹화 가능하기 때문에 녹화된 데이터를 시청하는 경우 발언자를 명확하게 구분할 수 없기 때문에 회의 내용을 용이하게 파악할 수 없는 문제가 있었다.In addition, even when recording a meeting, since only the entire video of the conference room can be recorded, when viewing the recorded data, the speaker cannot be clearly distinguished, so there is a problem that the contents of the meeting cannot be easily grasped.

공개특허공보 제10-2018-0111383호: 회의실 통합 제어 시스템 및 그 방법Laid-Open Patent Publication No. 10-2018-0111383: Conference room integrated control system and method therefor

본 발명은 상기와 같은 문제점을 개선하기 위해 창안된 것으로서, 회의를 촬영한 회의 영상에서 참석자의 얼굴을 검출하고, 해당 참석자들 중 회의에서 발언하는 발언자를 확대해서 촬영할 수 있는 AI를 이용한 무인 회의 영상 제공 시스템을 제공하는데 그 목적이 있다. The present invention has been devised to improve the above problems, and an unmanned conference video using AI that can detect a participant's face in a conference video shot at a conference and enlarge and photograph a speaker speaking in a conference among the participants. It aims to provide a provision system.

상기 목적을 달성하기 위한 본 발명에 따른 AI를 이용한 무인 회의 영상 제공 시스템은 다수의 참석자에 의해 회의가 진행되는 회의 영역을 촬영하는 촬영부와, 상기 촬영부에서 제공되는 회의 영상에서 상기 참석자의 얼굴을 검출하고, 검출된 정보를 토대로 상기 회의 영역에 대한 각 참석자의 위치를 판별하는 위치 산출부와, 상기 회의 영역 내의 참석자들의 발언 여부를 판별하는 발언 판별부와, 상기 발언 판별부 및 위치 산출부에서 제공되는 판별 정보를 토대로 상기 참석자 중 발언자의 위치를 판별하고, 상기 발언자가 위치한 상기 회의 영역의 일부분이 확대되어 촬영될 수 있도록 상기 촬영부를 제어하는 촬영 제어부를 구비한다. An unmanned conference video providing system using AI according to the present invention for achieving the above object includes a photographing unit for photographing a conference area where a conference is held by a plurality of participants, and the participant's face in the conference image provided by the photographing unit a position calculating unit for detecting and determining the location of each participant in the conference area based on the detected information; a speech determination unit for determining whether participants speak in the conference area; and a photographing control unit for determining the location of the speaker among the participants based on the determination information provided in the and controlling the photographing unit so that a portion of the conference area where the speaker is located can be enlarged and photographed.

한편, 본 발명에 다른 AI를 이용한 무인 회의 영상 제공 시스템은 상기 참석자의 발언이 입력될 수 있도록 상기 회의 영역에 설치된 다수의 마이크를 더 구비하고, 상기 발언 판별부는 상기 마이크들의 작동 상태를 분석하는 것으로서, 상기 마이크에 소리가 입력되면 해당 마이크에 인접된 상기 참석자가 발언한 것으로 판단한다. On the other hand, the unmanned conference video providing system using AI according to the present invention further includes a plurality of microphones installed in the conference area so that the speeches of the participants can be input, and the speech determination unit analyzes the operation status of the microphones. , when a sound is input into the microphone, it is determined that the participant adjacent to the microphone has spoken.

또한, 상기 발언 판별부는 상기 회의 영상에서 상기 참석자들의 입모양을 식별하고, 상기 참석자들 중 입모양이 변경되는 참석자가 발언하는 것으로 판별할 수도 있다. Also, the speech determining unit may identify the mouth shapes of the participants in the meeting image, and determine that a participant whose mouth shape is changed among the participants speaks.

상기 촬영 제어부는 상기 발언자의 발언이 완료되면, 상기 회의영역 전체가 촬영되도록 상기 촬영부를 제어할 수 있다. When the speaker's remarks are completed, the photographing control unit may control the photographing unit so that the entire conference area is photographed.

상기 촬영 제어부는 상기 발언자가 초기 위치에서 이동시, 해당 발언자가 상기 촬영부에 촬영될 수 있도록 제어할 수도 있다. When the speaker moves from the initial position, the photographing control unit may control the speaker to be photographed by the photographing unit.

한편, 본 발명에 따른 AI를 이용한 무인 회의 영상 제공 시스템은 상기 촬영부에서 제공받은 회의 영상을 편집하여 요약 영상을 생성하는 요약 편집부를 더 구비할 수도 있다. On the other hand, the unmanned conference video providing system using AI according to the present invention may further include a summary editing unit for generating a summary video by editing the conference video provided by the photographing unit.

상기 요약 편집부는 상기 발언 판별부에서 제공되는 판별 정보를 토대로 상기 회의 영상 중 발언자가 미존재하는 침묵시간 동안의 영상을 삭제하여 상기 요약 영상을 생성한다. The summary editing unit generates the summary video by deleting the video during the silent time in which the speaker is not present in the meeting video based on the determination information provided by the speech determination unit.

상기 요약 편집부는 상기 발언 판별부에서 제공되는 판별 정보를 토대로 상기 참석자들 중 어느 한 참석자가 발언하는 상태에서, 또 다른 참석자가 발언하는 중복 발언 상황이 유지되는 시간이 기설정된 유지시간을 초과할 경우, 말다툼 상태로 판단하고, 상기 회의 영상 중 상기 말다툼 상태에 대응되는 영상을 삭제하여 상기 요약 영상을 생성할 수도 있다. The summary editing unit, based on the determination information provided by the remark determination unit, in a state in which one of the participants speaks, when the time for maintaining the duplicate speech situation in which another participant speaks exceeds a preset holding time , may be determined to be a quarrel, and the summary image may be generated by deleting an image corresponding to the arguing state from among the meeting images.

본 발명에 따른 AI를 이용한 무인 회의 영상 제공 시스템은 회의를 촬영한 회의 영상에서 참석자의 얼굴을 검출하고, 해당 참석자들 중 회의에서 발언하는 발언자를 확대해서 촬영할 수 있으므로 회의 상황 및 내용을 정확하게 전달할 수 있는 회의 영상을 제공할 수 있다는 장점이 있다. The unmanned conference video providing system using AI according to the present invention detects the faces of attendees in the conference video recorded at the conference and enlarges and captures the speaker speaking in the conference among the attendees, so that the conference situation and contents can be accurately conveyed. It has the advantage of being able to provide a video of the meeting.

도 1은 본 발명에 따른 AI를 이용한 무인 회의 영상 제공 시스템에 대한 개념도이고,
도 2는 도 1의 AI를 이용한 무인 회의 영상 제공 시스템에 대한 블럭도이고,
도 3은 본 발명의 또 다른 실시 예에 따른 AI를 이용한 무인 회의 영상 제공 시스템에 대한 블럭도이다. 1 is a conceptual diagram of an unmanned conference video providing system using AI according to the present invention;
Figure 2 is a block diagram of an unmanned conference video providing system using AI of Figure 1;
3 is a block diagram of an unmanned conference video providing system using AI according to another embodiment of the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 실시예에 따른 AI를 이용한 무인 회의 영상 제공 시스템에 대해 상세히 설명한다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 본문에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. 첨부된 도면에 있어서, 구조물들의 치수는 본 발명의 명확성을 기하기 위하여 실제보다 확대하여 도시한 것이다. Hereinafter, an unmanned conference video providing system using AI according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. Since the present invention can have various changes and can have various forms, specific embodiments are illustrated in the drawings and described in detail in the text. However, this is not intended to limit the present invention to the specific disclosed form, it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements. In the accompanying drawings, the dimensions of the structures are enlarged than the actual size for clarity of the present invention.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. Terms such as first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as "comprise" or "have" are intended to designate that a feature, number, step, operation, component, part, or a combination thereof described in the specification exists, but one or more other features It is to be understood that it does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

도 1 및 도 2에는 본 발명에 따른 AI를 이용한 무인 회의 영상 제공 시스템(100)이 도시되어 있다. 1 and 2 show an unmanned conference video providing system 100 using AI according to the present invention.

도면을 참조하면, 상기 AI를 이용한 무인 회의 영상 제공 시스템(100)은 다수의 참석자(12)에 의해 회의가 진행되는 회의 영역(11)을 촬영하는 촬영부(110)와, 상기 메인 카메라(111)에서 제공되는 회의 영상에서 상기 참석자(12)의 얼굴을 검출하고, 검출된 정보를 토대로 상기 회의 영역(11)에 대한 각 참석자(12)의 위치를 판별하는 위치 산출부(120)와, 상기 회의 영역(11) 내의 참석자(12)들의 발언 여부를 판별하는 발언 판별부(130)와, 상기 발언 판별부(130) 및 위치 산출부(120)에서 제공되는 판별 정보를 토대로 상기 촬영부(110)를 제어하는 촬영 제어부(140)를 구비한다. 여기서, 회의 영역(11)에는 참석자(12)들이 각각 참석할 수 있는 좌석들이 마련되며, 해당 좌석에 인접된 위치의 회의 영역(11)에는, 상기 참석자(12)의 발언이 입력될 수 있도록 다수의 마이크(13)가 설치되어 있다. Referring to the drawings, the unmanned conference video providing system 100 using the AI includes a photographing unit 110 for photographing a conference area 11 in which a conference is held by a plurality of participants 12 , and the main camera 111 . ), a location calculator 120 that detects the face of the participant 12 in the meeting image provided by and determines the location of each participant 12 in the conference area 11 based on the detected information; The recording unit 110 based on the determination information provided by the speech determining unit 130 that determines whether the participants 12 in the conference area 11 speak or not, and the determination information provided by the speech determining unit 130 and the location calculating unit 120 . ) is provided with a photographing control unit 140 for controlling the. Here, the meeting area 11 is provided with seats each of which the participants 12 can attend, and in the conference area 11 adjacent to the corresponding seat, a plurality of seats are provided so that the speech of the participants 12 can be input. A microphone 13 is installed.

상기 촬영부(110)는 상기 회의 영역(11)에 대향되는 위치에 설치되어 회의 영역(11)을 촬영하는 메인 카메라(111)와 상기 메인 카메라(111)를 팬, 틸트 구동시키는 구동부(112)를 구비한다. The photographing unit 110 is installed at a position opposite to the conference area 11 , and a main camera 111 for photographing the conference area 11 and a driving unit 112 for pan/tilt driving the main camera 111 . to provide

상기 메인 카메라(111)는 회의 영역(11) 전체를 촬영할 수 있도록 소정의 촬영 각을 갖는 갖으며, 줌인 및 줌아웃이 가능한 것이 적용된다. 상기 구동부(112)는 상기 메인 카메라(111)의 몸체를 제어신호에 따라 팬, 틸트 구동될 수 있게 마련된 구동체로서, 메인 카메라(111)의 몸체 하부로 연장된 회전바디를 모터들의 회전에 의해 좌우 회전 및 상하 회전시킬 수 있게 되어 있고, 이러한 구동체의 구조는 다양하게 공지되어 있어 상세한 설명은 생략한다. The main camera 111 has a predetermined photographing angle so as to photograph the entire conference area 11 , and zoom-in and zoom-out are applied. The driving unit 112 is a driving body provided to be able to pan and tilt the body of the main camera 111 according to a control signal. It is possible to rotate left and right and up and down, and the structure of such a driving body is known in various ways, so a detailed description thereof will be omitted.

위치산출부는 메인 카메라(111)에서 촬영된 회의 영상을 분석하는 것으로서, 해당 회의 영상에서, 회의에 참석한 참석자(12)들의 얼굴을 검출한다. 다음, 위치산출부는 검출된 정보를 토대로 회의 영역(11)에서의 각 참석자(12)의 위치를 판별한다. 이때, 위치산출부는 회의 영역(11) 내에서 참석자(12)가 착석 가능한 좌석 위치에 대한 세팅정보가 기저장된 것이 바람직하다. 상기 위치산출부는 참석자(12)들의 검출정보와 상기 세팅정보를 비교하여 해당 참석자(12)들의 위치를 판별할 수 있다. The location calculator analyzes the meeting image captured by the main camera 111 and detects the faces of the participants 12 attending the meeting from the meeting image. Next, the location calculator determines the location of each participant 12 in the conference area 11 based on the detected information. In this case, it is preferable that the position calculating unit pre-stores setting information for the seat position where the participant 12 can be seated in the conference area 11 . The location calculator may determine the location of the participants 12 by comparing the detection information of the participants 12 with the setting information.

발언 판별부(130)는 마이크(13)들의 작동 상태를 분석하여 참석자(12)들 중 발언자를 산출한다. 즉, 발언 판별부(130)는 마이크(13)에 소리가 입력되면, 해당 마이크(13)에 인접된 참석자(12)가 발언한 것으로 판단할 수 있다. 여기서, 발언 판별부(130)는 판별 정보를 촬영 제어부(140)에 전송한다. The speech determination unit 130 analyzes the operating state of the microphones 13 and calculates a speaker among the participants 12 . That is, when a sound is input to the microphone 13 , the speech determining unit 130 may determine that the participant 12 adjacent to the corresponding microphone 13 has spoken. Here, the speech determination unit 130 transmits determination information to the photographing control unit 140 .

이때, 발언 판별부(130)는 메인 카메라(111)에서 제공되는 회의 영상을 분석하여 발언자를 판별할 수도 있다. 회의에서 발언을 할 경우, 나머지 참석자(12)들과 달리 발언자는 발언 내용에 따라 입모양이 변한다. 따라서, 발언 판별부(130)는 상기 회의 영상에서 참석자(12)들의 입모양을 식별하고, 참석자(12)들 중 입모양이 변경되는 참석자(12)를 발언자로 판별할 수 있다. 상기 발언 판별부(130)는 영상에서 사람의 표정을 분석하기 위해 종래에 일반적으로 사용되는 영상 분석 수단이므로 상세한 설명은 생략한다. In this case, the speech determination unit 130 may analyze the conference image provided from the main camera 111 to determine the speaker. When speaking in a meeting, unlike the rest of the participants (12), the speaker's mouth changes according to the content of the speech. Accordingly, the speech determining unit 130 may identify the mouth shapes of the participants 12 in the meeting image, and determine the participant 12 whose mouth shape is changed among the participants 12 as the speaker. Since the speech determination unit 130 is an image analysis means generally used in the prior art to analyze a person's expression in an image, a detailed description thereof will be omitted.

촬영 제어부(140)는 상기 발언 판별부(130) 및 위치 산출부(120)에서 제공되는 판별 정보를 토대로 상기 참석자(12) 중 발언자의 위치를 판별한다. 즉, 촬영 제어부(140)는 발언 판별부(130)에서 판별된 발언자가 위치를 위치 산출부(120)에서 제공되는 판별 정보를 통해 산출한다. The photographing control unit 140 determines the location of the speaker among the participants 12 based on the determination information provided by the speech determination unit 130 and the location calculation unit 120 . That is, the photographing control unit 140 calculates the position of the speaker determined by the speech determination unit 130 through the determination information provided by the location calculation unit 120 .

다음, 촬영 제어부(140)는 상기 발언자가 위치한 상기 회의 영역(11)의 일부분이 확대되어 촬영될 수 있도록 촬영부(110)를 제어한다. 즉, 촬영 제어부(140)는 메인 카메라(111)가 촬영하는 촬영범위 중앙부에 상기 발언자가 위치하도록 구동부(112)를 제어하고, 해당 촬영범위 중앙부에 발언자가 위치하면 메인 카메라(111)를 줌인하여 해당 발언자를 확대 촬영한다. 따라서, 회의 영상에 발언자가 확대 촬영되므로 해당 발언자의 표정 및 몸짓이 보다 명확하게 촬영되어 회의 영상이 회의 상황을 보다 명확하게 전달할 수 있다. Next, the photographing control unit 140 controls the photographing unit 110 so that a portion of the conference area 11 in which the speaker is located can be enlarged and photographed. That is, the photographing control unit 140 controls the driving unit 112 so that the speaker is located in the center of the photographing range captured by the main camera 111, and zooms in the main camera 111 when the speaker is located in the center of the photographing range. Zoom in on the speaker. Accordingly, since the speaker is magnified in the meeting video, the speaker's facial expressions and gestures are more clearly recorded, so that the meeting video can more clearly convey the meeting situation.

한편, 촬영 제어부(140)는 발언자로 판별된 참석자(12)가 발원을 완료하면, 회의 영역(11) 전체가 촬영되도록 메인 카메라(111)가 초기 위치로 복귀하도록 구동부(112)를 제어하고, 메인 카메라(111)를 초기 세팅 값으로 줌 아웃시킨다. 그리고, 발언 판별부(130)로부터 새로운 발언자가 판별되면, 상기 촬영 제어부(140)는 새로운 발언자가 위치한 회의 영역(11)의 일부분이 확대되어 촬영되도록 촬영부(110)를 제어한다. On the other hand, the shooting control unit 140 controls the driving unit 112 to return the main camera 111 to the initial position so that the entire conference area 11 is photographed when the participant 12 determined as the speaker completes the request, The main camera 111 is zoomed out to the initial setting value. In addition, when a new speaker is determined by the speech determination unit 130 , the photographing control unit 140 controls the photographing unit 110 so that a portion of the conference area 11 in which the new speaker is located is enlarged and photographed.

또한, 촬영 제어부(140)는 상기 발언자가 초기 위치 즉, 발언 시작 위치에서 이동시 해당 발언자가 상기 촬영부(110)에 촬영될 수 있도록 촬영부(110)를 제어한다. 즉, 촬영 제어부(140)는 발언자의 이동을 따라 메인 카메라(111)가 회동되도록 상기 구동부(112)를 제어할 수 있다. In addition, the photographing control unit 140 controls the photographing unit 110 so that the speaker can be photographed by the photographing unit 110 when the speaker moves from an initial position, that is, a speech start position. That is, the photographing control unit 140 may control the driving unit 112 to rotate the main camera 111 according to the movement of the speaker.

상술된 바와 같이 구성된 본 발명에 따른 AI를 이용한 무인 회의 영상 제공 시스템(100)은 회의를 촬영한 회의 영상에서 참석자(12)의 얼굴을 검출하고, 해당 참석자(12)들 중 회의에서 발언하는 발언자를 확대해서 촬영할 수 있으므로 회의 상황 및 내용을 정확하게 전달할 수 있는 회의 영상을 제공할 수 있다는 장점이 있다. The system 100 for providing an unmanned conference video using AI according to the present invention configured as described above detects the face of the participant 12 in the conference image captured by the conference, and a speaker speaking in the conference among the participants 12 It has the advantage of being able to provide a meeting video that can accurately convey the meeting situation and contents because it can be photographed in an enlarged manner.

한편, 도 3에는 본 발명의 또 다른 실시 예에 따른 AI를 이용한 무인 회의 영상 제공 시스템(200)이 도시되어 있다. Meanwhile, FIG. 3 shows an unmanned conference video providing system 200 using AI according to another embodiment of the present invention.

앞서 도시된 도면에서와 동일한 기능을 하는 요소는 동일 참조부호로 표기한다.Elements having the same function as in the drawings shown above are denoted by the same reference numerals.

도면을 참조하면, 상기 AI를 이용한 무인 회의 영상 제공 시스템(200)은 상기 촬영부(110)에서 제공받은 회의 영상을 편집하여 요약 영상을 생성하는 요약 편집부(210)를 더 구비한다. Referring to the drawings, the unmanned conference video providing system 200 using the AI further includes a summary editing unit 210 that creates a summary video by editing the conference video provided by the photographing unit 110 .

상기 요약 편집부(210)는 참석자(12)들 중 첫 발언을 시작한 시점에서, 참석자(12)들의 마지막 발언 시점까지 메인 카메라(111)에서 촬영된 회의 영상을 요약 영상으로 편집한다. 여기서, 요약 편집부(210)는 발언 판별부(130)에서 제공되는 판별정보를 토대로 상기 회의 영상 중 발언자가 미존재하는 침묵시간 동안의 영상을 삭제하여 요약 영상을 생성할 수 있다. The summary editing unit 210 edits the meeting video captured by the main camera 111 from the time when the first speech of the participants 12 to the point of the last speech of the participants 12 into a summary video. Here, the summary editing unit 210 may generate a summary image by deleting an image during a silent time in which the speaker is not present in the meeting video based on the determination information provided by the speech determination unit 130 .

한편, 회의 중 참석자(12)들 간에 말다툼이 발생되는데, 일반적으로 말다툼 내용은 주로 회의 내용과 벗어난다. 따라서, 요약 편집부(210)는 회의 영상 중 말다툼 영상을 삭제하여 요약 영상을 생성할 수 있다. 여기서, 요약 편집부(210)는 발언 판별부(130)에서 제공되는 판별 정보를 토대로 상기 참석자(12)들 중 어느 한 참석자(12)가 발언하는 상태에서, 또 다른 참석자(12)가 발언하는 중복 발언 상황이 유지되는 시간이 기설정된 유지시간을 초과할 경우, 말다툼 상태로 판단한다. 여기서, 유지시간은 30초 내지 1분으로 적용되나, 이에 한정하는 것이 아니라 회의 내용에 따라 회의를 진행하는 진행자가 임의의 시간을 설정할 수 있다. 다음, 요약 편집부(210)는 회의 영상 중 말다툼 상태에 대응되는 영상을 삭제하여 요약 영상을 생성할 수 있다. On the other hand, a quarrel occurs between the participants 12 during the meeting, and in general, the content of the quarrel mainly deviates from the content of the meeting. Accordingly, the summary editing unit 210 may generate a summary video by deleting the arguing video from among the meeting video. Here, the summary editing unit 210 is a state in which one of the participants 12 is speaking, based on the determination information provided by the remark determination unit 130 , and another participant 12 speaks in duplicate. If the time for maintaining the speech situation exceeds the preset holding time, it is determined as a quarrel. Here, the holding time is applied to 30 seconds to 1 minute, but is not limited thereto, and the moderator who conducts the meeting may set an arbitrary time according to the contents of the meeting. Next, the summary editing unit 210 may generate a summary image by deleting an image corresponding to the arguing state from among the meeting images.

상술된 바와 같이 요약 편집부(210)는 침묵시간동안의 영상 또는 말다툼 상태의 영상을 제외한 회의 영상으로 요약 영상을 생성하므로 회의 내용을 보다 명확하게 전달할 수 있는 요약 영상을 제작할 수 있다는 장점이 있다. As described above, the summary editing unit 210 generates a summary video as a conference video excluding the video during the silence period or the video in the arguing state, so that it is possible to produce a summary video that can more clearly convey the contents of the conference.

제시된 실시예들에 대한 설명은 임의의 본 발명의 기술분야에서 통상의 지식을 가진 자가 본 발명을 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 발명의 기술 분야에서 통상의 지식을 가진자에게 명백할 것이며, 여기에 정의된 일반적인 원리들은 본 발명의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 발명은 여기에 제시된 실시예들로 한정되는 것이 아니라, 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다.The description of the presented embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the invention. Thus, the present invention is not to be limited to the embodiments presented herein, but is to be construed in the widest scope consistent with the principles and novel features presented herein.

100: AI를 이용한 무인 회의 영상 제공 시스템
110: 촬영부
111: 메인 카메라
112: 구동부
120: 위치 산출부
130: 발언 판별부
140: 촬영 제어부100: Unmanned conference video providing system using AI
110: shooting department
111: main camera
112: driving unit
120: position calculation unit
130: speech discrimination unit
140: shooting control

Claims

a photographing unit for photographing a conference area in which a conference is held by a plurality of participants;
a location calculator for detecting the face of the participant in the meeting image provided by the photographing unit, and determining the location of each participant in the meeting area based on the detected information;
a speech determination unit for determining whether or not participants in the conference area have spoken;
a photographing control unit for determining the location of a speaker among the participants based on the determination information provided by the speech determining unit and the location calculating unit, and controlling the photographing unit so that a portion of the conference area in which the speaker is located can be enlarged and photographed; and
and a summary editing unit that edits the meeting image provided by the photographing unit to generate a summary image; and
When the summary editing unit maintains a duplicate speech situation in which one of the participants speaks based on the determination information provided by the remark determination unit, another participant speaks, exceeds a preset holding time , determining the quarrel state, and deleting the image corresponding to the quarrel state from among the meeting video to generate the summary image,
An unmanned conference video providing system using AI.

According to claim 1,
Further comprising; a plurality of microphones installed in the conference area so that the speech of the participant can be input;
The speech determination unit analyzes the operating state of the microphones, and when a sound is input into the microphone, it is determined that the participant adjacent to the microphone has spoken,
An unmanned conference video providing system using AI.

The method of claim 1,
The speech determination unit identifies the mouth shape of the participants in the meeting video, and determines that the participant whose mouth shape is changed among the participants speaks,
An unmanned conference video providing system using AI.

4. The method of claim 2 or 3,
When the speech of the speaker is completed, the photographing control unit controls the photographing unit so that the entire conference area is photographed,
An unmanned conference video providing system using AI.

According to claim 1,
When the speaker moves from the initial position, the photographing control unit controls so that the speaker can be photographed by the photographing unit,
An unmanned conference video providing system using AI.

delete

According to claim 1,
The summary editing unit generates the summary video by deleting the video during the silent time in which the speaker does not exist in the meeting video based on the determination information provided by the speech determination unit,
An unmanned conference video providing system using AI.

delete