KR20110085160A

KR20110085160A - Stenography input system and method for conference using face recognition

Info

Publication number: KR20110085160A
Application number: KR1020100004784A
Authority: KR
Inventors: 안문학
Original assignee: 주식회사 소리자바
Priority date: 2010-01-19
Filing date: 2010-01-19
Publication date: 2011-07-27
Also published as: KR101077267B1

Abstract

PURPOSE: A face recognition conference shorthand system and method are provided to stored and confirm video and audio signals related to a speaker in the conference and to match the video and audio signal with speaker related information. CONSTITUTION: A camera unit(130) comprises one or more cameras collecting an image which includes one or more participants attending in the conference. A microphone(120) comprises one or more microphones collecting an audio signal which at least one speakers among participants utters. According to user input and set conference schedule information, a control unit(160) collects so as to collect the audio and video signals of the speaker and to store the audio and video signal as a conference record.

Description

Stenography Input System And Method For Conference Using Face Recognition

본 발명은 회의 속기 기술에 관한 것으로, 회의 진행 중에 적어도 한 명의 발언자에 대한 영상 및 오디오 신호를 수집하여 발언자별로 저장 및 출력할 수 있도록 하며, 상기 발언자의 인물 정보를 함께 저장 및 출력할 수 있도록 지원하는 얼굴인식 회의 속기 시스템 및 방법에 관한 것이다.The present invention relates to a shorthand technology for meetings, to collect and output video and audio signals for at least one speaker during a conference, and to store and output each speaker, and to store and output person information of the speaker together. A facial recognition conference shorthand system and method.

원격 사용자간의 효과적인 의사전달을 위해 고려된 화상 회의 시스템은 동영상 압축 기술 및 네트워크의 발달을 통해 그 효과가 입증되고 있으며 사용 빈도가 더욱 늘어나고 있다. 특히 최근에는 이러한 화상 회의 시스템에 얼굴 인식과 발언자 분석과 같은 다양한 기능이 결합되고 있는 추세이다. The video conferencing system considered for effective communication between remote users has been proved to be effective through the development of video compression technology and network, and the frequency of use is increasing. Recently, various functions such as face recognition and speaker analysis have been combined in such a video conference system.

그러나 실생활에서 발생하는 대다수의 일반적인 회의의 경우 별도로 기록하거나 메모할 수 있는 시스템이 제공되지 않아, 참석자들의 메모에만 의지하고 있다. 이에 따라 회의 중 또는 회의가 종료된 이후 어떠한 참석자가 어떠한 발언을 하였는지를 명확하게 알 수 없고, 회의에서 토의된 논점이나 논점의 결론 등을 정확하게 인지하고 전파하기가 어려운 실정이다.However, the majority of general meetings that occur in real life do not provide a system for recording or taking notes separately, so they only rely on the memo of attendees. As a result, it is difficult to clearly know what the participants have been saying during the meeting or after the meeting, and it is difficult to accurately recognize and disseminate the arguments and conclusions discussed at the meeting.

따라서 본 발명의 목적은 회의 중 또는 회의 녹화 영상 재생시 발언자의 발언을 정확히 속기할 수 있도록 지원하는 얼굴인식 회의 속기 시스템 및 방법을 제공함에 있다.Accordingly, an object of the present invention is to provide a face recognition conference shorthand system and method for supporting a shorthand of a speaker during a meeting or playback of a recorded video.

본 발명의 다른 목적은 회의 중에 발생하는 발언자 관련 영상과 발언을 기록하여, 향후 의사 결정이나 업무 진행시 자료로 사용할 수 있도록 지원 가능한 얼굴인식 회의 속기 시스템 및 방법을 제공함에 있다.It is another object of the present invention to provide a facial recognition conference shorthand system and method capable of recording a speaker-related video and remarks generated during a meeting so that they can be used as data for future decision-making or business progress.

상술한 바와 같은 본 발명의 바람직한 실시 예에 따른 얼굴인식 회의 속기 시스템은, 카메라부, 마이크부 및 제어부의 구성을 포함한다. 상기 카메라부는 회의에 참석한 적어도 하나의 참석자를 포함하는 영상을 수집하는 적어도 하나의 카메라를 포함하며, 상기 마이크부는 상기 참석자들 중 적어도 한명의 발언자가 발언하는 오디오 신호를 수집하는 적어도 하나의 마이크를 포함한다. 그리고 상기 제어부는 사용자 입력 및 기 설정된 회의 스케줄 정보에 따라 발언자의 오디오 신호 및 발언자의 영상을 수집하여 회의 기록으로 저장하도록 제어한다.Face recognition conference shorthand system according to a preferred embodiment of the present invention as described above includes the configuration of the camera unit, the microphone unit and the control unit. The camera unit includes at least one camera that collects an image including at least one participant who attended a meeting, and the microphone unit includes at least one microphone that collects an audio signal spoken by at least one speaker of the participants. do. The controller controls to collect the speaker's audio signal and the speaker's video according to a user input and preset conference schedule information, and store the same as a conference record.

본 발명은 또한, 회의에 참석한 적어도 하나의 참석자들 중 발언을 수행하거나 수행할 발언자의 위치를 확인하는 과정, 상기 발언자의 영상 및 상기 발언자의 발언에 대응하는 오디오 신호를 수집하는 과정, 상기 수집된 발언자의 영상 및 상기 발언자의 오디오 신호를 저장하는 과정을 포함하는 회의 속기 방법을 개시한다. 여기서 상기 회의 속기 방법은 상기 참석자들의 얼굴 인식을 위한 샘플 정보를 저장하는 과정, 상기 참석자들의 인물 정보를 저장하는 과정, 상기 발언자의 영상을 영상 인식하여 발언자의 인물 정보를 확인하고, 확인된 인물 정보를 상기 발언자의 영상 및 오디오 신호와 함께 저장하는 과정을 포함할 수 있다.The present invention also provides a method of identifying a position of a speaker to perform or perform a speech among at least one participant who attends a meeting, collecting an image of the speaker and an audio signal corresponding to the speaker's speech, and collecting the speech. Disclosed is a conference shorthand method comprising the step of storing an image of a speaker and an audio signal of the speaker. The method of shorthand meeting may include storing sample information for face recognition of the attendees, storing person information of the attendees, recognizing the speaker's image, and confirming the speaker's person information, and confirming the confirmed person information. It may include the step of storing with the video and audio signal of the speaker.

그리고 상기 발언자의 위치를 확인하는 과정은 상기 참석자를 포함하는 전체 영상을 표시부에 출력하고, 상기 표시부에 출력된 전체 영상에 포함된 참석자들 중 특정 참석자를 발언자로 지시하는 과정, 기 저장된 스케줄 정보를 확인하고, 상기 스케줄 정보에 따라 특정 참석자를 발언자로 지정하는 과정, 상기 참석자를 포함하는 전체 영상을 획득하고, 상기 전체 영상에 포함된 각 참석자들의 얼굴 영상을 인식하여 발언을 수행하는 영상으로 인식된 참석자를 발언자로 지시하는 과정, 참석자들에 할당된 마이크들로 오디오 신호를 수집하고, 오디오 신호가 수집된 위치의 참석자를 발언자로 결정하는 과정 중 적어도 하나의 과정을 포함할 수 있다.The determining of the location of the speaker may include outputting an entire image including the participant to a display unit, instructing a specific participant among the participants included in the entire image output to the display unit as a speaker, and pre-stored schedule information. Confirming, designating a specific participant as a speaker according to the schedule information, acquiring an entire image including the participant, and recognizing a face image of each participant included in the entire image to recognize the image; The method may include at least one of instructing an attendee as a speaker, collecting an audio signal through microphones assigned to the attendee, and determining an attendee at the location where the audio signal is collected as the speaker.

본 발명의 얼굴인식 회의 속기 시스템 및 방법에 따르면, 본 발명은 회의 중 발언자와 관련된 영상 및 오디오 신호를 저장 및 확인할 수 있도록 지원하고, 발언자 관련 정보를 함께 매칭함으로써, 회의와 관련된 다양한 정보들을 저장, 관리, 검색할 수 있도록 지원할 수 있다.According to the face recognition meeting shorthand system and method of the present invention, the present invention supports to store and confirm video and audio signals related to a speaker during a meeting, and to match various speaker related information, thereby storing various information related to a meeting, Can be managed and searched.

또한 본 발명은 회의 중 또는 회의 녹화 영상 재생시 발언자의 발언을 정확히 속기할 수 있도록 지원할 수 있다. 즉 일반적인 회의에서 여러 명의 발언자가 발언하는 경우 목소리나 외모가 비슷하여 어느 발언자의 발언인지 확인하기 어려울 때 또는 외국어ㅇ전문용어ㅇ사투리ㅇ어눌한 발음 등으로 발언의 진위, 뉘앙스, 발언자 식별, 발음 확인 등 속기하기 어려운 경우가 있는데, 회의 중 회의 참석자의 얼굴을 인식하여 현재 발언자를 구분하여 줌으로써 속기를 쉽게 할 수 있도록 지원하며, 또한 특정 발언자의 발언만 따로 분류하여 저장하거나 재생함으로써 더욱 정확한 속기를 할 수 있도록 지원한다. In addition, the present invention can support to accurately shorten the speech of the speaker during the meeting or during playback of the recorded video recording. In other words, when multiple speakers speak in a general meeting, it is difficult to identify which speaker is speaking because of the similar voice or appearance, or the authenticity, nuance, identification of the speaker, and confirmation of the pronunciation, for example, in a foreign-terminology terminology. In some cases, it is difficult to be shorthanded, and it is easy to shorten by recognizing the participant's face during the meeting and distinguishing the current speaker. To help.

도 1은 본 발명의 실시 예에 따른 회의 상황의 일예를 예시적으로 나타낸 도면,
도 2는 본 발명의 실시 예에 따른 얼굴인식 회의 속기 시스템의 구성을 개략적으로 나타낸 블록도,
도 3은 도 2의 제어부 구성을 보다 상세히 나타낸 도면,
도 4는 본 발명의 회의 기록 지원 과정을 위해 제공되는 화면 인터페이스의 일예를 나타낸 도면,
도 5는 본 발명의 실시 예에 따른 회의 속기 방법을 설명하기 위한 순서도.1 is a view showing an example of a conference situation according to an embodiment of the present invention,
2 is a block diagram schematically showing a configuration of a face recognition conference shorthand system according to an embodiment of the present invention;
3 is a view showing in more detail the configuration of the control unit of FIG.
4 is a diagram illustrating an example of a screen interface provided for a conference recording support process of the present invention;
5 is a flowchart illustrating a conference shorthand method according to an embodiment of the present invention.

이하, 본 발명에 따른 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 하기의 설명에서는 본 발명의 실시 예에 따른 동작을 이해하는데 필요한 부분만이 설명되며, 그 이외 부분의 설명은 본 발명의 요지를 흩트리지 않도록 생략될 것이라는 것을 유의하여야 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, only parts necessary for understanding the operation according to the embodiment of the present invention will be described, and the description of other parts will be omitted so as not to disturb the gist of the present invention.

이하에서 설명되는 본 발명의 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.The terms or words used in the specification and claims of the present invention described below should not be construed as being limited to ordinary or dictionary meanings, and the inventors should use the concept of terms to explain their own invention in the best way. Based on the principle that it can be properly defined, it should be interpreted as meaning and concept corresponding to the technical idea of the present invention. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only one of the most preferred embodiments of the present invention, and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

도 1은 본 발명의 실시 예에 따른 얼굴인식 회의 속기 시스템이 지원할 수 있는 회의 상황의 일예를 예시적으로 나타낸 도면이다.1 is a view showing an example of a meeting situation that can be supported by the face recognition meeting shorthand system according to an embodiment of the present invention.

상기 도 1을 참조하면, 회의 시작 시 다수의 참석자들은 회의석에 각자 자리에 착석하여, 일정 형태의 회의를 수행하게 된다. 이때 각 참석자들은 회의 스케줄 정보에 따라 정해진 자리에 착석하는 것이 바람직하다. 한편 본 발명의 얼굴인식 회의 속기 시스템은 크게 카메라부(130)와 마이크부(120)를 포함하며, 상기 카메라부(130)와 마이크부(120)가 수집하는 영상 및 오디오 신호를 저장, 출력 및 관리할 수 있는 전자기기를 포함할 수 있다. 이에 따라 상기 마이크는 회의장 내에서 자리 배치별로 각각 배치될 수 있도록 복수개가 마련될 수 있다. 이렇게 복수개가 마련된 마이크부(120)는 각 자리에서 수집된 오디오 정보의 구분을 위하여 ID 번호를 부여받을 수 있다. 여기서 본 발명의 얼굴인식 회의 속기 시스템이 복수 개의 마이크부(120)를 포함하는 것으로 설명하였지만 본 발명이 이에 한정되는 것은 아니다. 즉 본 발명의 얼굴인식 회의 속기 시스템은 하나의 마이크만 회의장 내에 배치될 수 도 있다.Referring to FIG. 1, at the start of a meeting, a plurality of attendees are each seated in a meeting seat to perform a certain type of meeting. At this time, each participant is preferably seated according to the meeting schedule information. Meanwhile, the face recognition conference shorthand system of the present invention includes a camera unit 130 and a microphone unit 120, and stores, outputs, and outputs video and audio signals collected by the camera unit 130 and the microphone unit 120. It may include a manageable electronic device. Accordingly, the microphone may be provided in plural so as to be arranged for each seat arrangement within the conference hall. In this way, the plurality of microphones 120 may be provided with an ID number to distinguish the audio information collected at each position. Here, the face recognition conference shorthand system of the present invention has been described as including a plurality of microphone units 120, but the present invention is not limited thereto. That is, in the face recognition conference shorthand system of the present invention, only one microphone may be disposed in the conference hall.

한편, 카메라부(130)는 회의장에 참석한 참석자들의 얼굴을 촬영할 수 있는 각도에 배치되는 것이 바람직하다. 예를 들어 6명의 참석자들이 테이블 주변으로 착석하는 경우 상기 카메라부(130)는 6명 참석자들 각각의 얼굴을 촬영할 수 있는 각도를 가지도록 배치될 수 있다. 여기서, 실질적으로 테이블의 형태에 따라 참석자들 각각의 얼굴을 촬영할 수 없는 경우가 발생할 수 있다. 이에 따라 본 발명의 얼굴인식 회의 속기 시스템은 여러 개의 카메라를 특정 지점에 배치하여 참석자들 각각의 얼굴을 촬영할 수 있도록 지원할 수 있다. 이때 상기 얼굴인식 회의 속기 시스템은 복수 개의 카메라가 배치되는 경우, 각 카메라들에 ID 번호를 부여하고, 수집된 영상 신호가 어떠한 카메라로부터 수신되는지를 확인할 수 있다. On the other hand, the camera unit 130 is preferably disposed at an angle capable of taking a picture of the participants attending the conference room. For example, when six attendees are seated around a table, the camera unit 130 may be arranged to have an angle at which each of the six attendees may photograph their face. In this case, a case in which the face of each participant cannot be photographed according to the shape of the table may occur. Accordingly, the face recognition meeting shorthand system of the present invention may support the capturing of each participant's face by arranging a plurality of cameras at a specific point. In this case, when a plurality of cameras are arranged, the face recognition conference shorthand system may assign ID numbers to the cameras and check which cameras the collected video signals are received.

한편, 상기 도 1에서 도시한 회의장의 경우 직사각형의 탁자를 중심으로 참석자들이 병렬로 배열되는 것을 도시하였으나, 본 발명이 이에 한정되는 것은 아니다. 즉, 본 발명의 얼굴인식 회의 속기 시스템이 적용될 수 있는 환경은 상기 회의장의 형태나 회의장 내에 비치된 기기들에 의하여 한정되는 것은 아니며, 회의에 참석한 참석자들의 전면 얼굴 또는 측면 얼굴 등을 각각 촬영할 수 있는 카메라와, 참석자들의 발언을 수집할 수 있는 마이크 장치가 적절하게 배치될 수 있는 환경이라면 어떠한 환경이라도 적용 가능할 수 있다. 그리고 상기 카메라부(130)와 마이크부(120)는 복수 개를 운용할 수 있도록 지원하기 때문에 실질적으로 본 발명의 얼굴인식 회의 속기 시스템은 일반적인 회의 진행 과정을 별도의 제한 없이 지원 가능할 것이다.Meanwhile, in the case of the conference room shown in FIG. 1, participants are arranged in parallel around a rectangular table, but the present invention is not limited thereto. That is, the environment to which the face recognition meeting shorthand system of the present invention can be applied is not limited to the form of the conference hall or the devices provided in the conference hall, and the front face or the side face of the attendees who attended the conference may be respectively photographed. Any environment may be applicable as long as the camera and the microphone device capable of collecting the voices of the attendees may be properly disposed. In addition, since the camera unit 130 and the microphone unit 120 support a plurality of operations, the face recognition conference shorthand system of the present invention may substantially support a general conference process without any limitation.

이하 상기 회의장 내에 배치되어 회의장에서 논의되는 다양한 회의 기록의 관리를 지원할 수 있도록 하는 본 발명의 얼굴인식 회의 속기 시스템의 구성에 대하여 도 2를 참조하여 보다 상세히 설명하기로 한다.Hereinafter, the configuration of the facial recognition meeting shorthand system of the present invention disposed in the conference room to support management of various conference records discussed in the conference room will be described in more detail with reference to FIG. 2.

도 2는 본 발명의 실시 예에 따른 얼굴인식 회의 속기 시스템의 구성을 개략적으로 나타낸 블록도이다.2 is a block diagram schematically illustrating a configuration of a face recognition conference shorthand system according to an exemplary embodiment of the present invention.

상기 도 2를 참조하면, 본 발명의 얼굴인식 회의 속기 시스템(100)은 입력부(110), 마이크부(120), 카메라부(130), 표시부(140), 저장부(150) 및 제어부(160)의 구성을 포함할 수 있다. 여기서 상기 마이크부(120)와 상기 카메라부(130)는 회의장의 특정한 지점이나 위치 등에 나뉘어져 분포될 수 있다. 이에 따라 상기 얼굴인식 회의 속기 시스템(100)은 상기 마이크부(120)와 상기 카메라부(130)로부터 영상 및 오디오 신호를 수신할 수 있는 통신 인터페이스를 더 포함할 수 있다. 이 경우 상기 마이크부(120) 및 상기 카메라부(130) 또한 상기 얼굴인식 회의 속기 시스템(100)의 통신 인터페이스와 통신 채널을 형성할 수 있는 구성을 포함할 수 있다. 예를 들면 상기 마이크부(120) 및 카메라부(130)는 상기 얼굴인식 회의 속기 시스템(100)의 제어부(160)와 케이블 등의 유선으로 연결될 수 있다. 또한 상기 마이크부(120) 및 카메라부(130)는 상기 얼굴인식 회의 속기 시스템(100)의 제어부(160)와 무선으로 연결될 수 있다. 즉 상기 얼굴인식 회의 속기 시스템(100)의 통신 인터페이스는 상기 얼굴인식 회의 속기 시스템(100)을 운용하는 운용자 또는 시스템을 구성하는 설계자들의 의도에 따라 유무선 등의 다양한 통신 인터페이스 형태로 구현될 수 있을 것이다. 또한 상기 얼굴인식 회의 속기 시스템(100)은 수집된 오디오 신호의 출력을 위하여 스피커를 포함하는 오디오 처리부를 더 포함할 수 있다.Referring to FIG. 2, the face recognition conference shorthand system 100 of the present invention includes an input unit 110, a microphone unit 120, a camera unit 130, a display unit 140, a storage unit 150, and a controller 160. It may include the configuration of). The microphone unit 120 and the camera unit 130 may be divided and distributed in a specific point or location of the conference hall. Accordingly, the face recognition conference shorthand system 100 may further include a communication interface capable of receiving video and audio signals from the microphone unit 120 and the camera unit 130. In this case, the microphone unit 120 and the camera unit 130 may also include a configuration capable of forming a communication channel with a communication interface of the face recognition conference shorthand system 100. For example, the microphone unit 120 and the camera unit 130 may be connected to the control unit 160 of the face recognition shorthand system 100 by wire, such as a cable. In addition, the microphone unit 120 and the camera unit 130 may be wirelessly connected to the controller 160 of the face recognition shorthand system 100. That is, the communication interface of the face recognition conference shorthand system 100 may be implemented in various communication interfaces such as wired and wireless according to the intention of an operator who operates the face recognition conference shorthand system 100 or designers configuring the system. . In addition, the face recognition conference shorthand system 100 may further include an audio processor including a speaker for outputting the collected audio signal.

한편, 상술한 구성을 가지는 본 발명의 얼굴인식 회의 속기 시스템(100)은 회의가 시작되면, 저장부(150)에 저장된 회의 스케줄 정보에 따라 마이크부(120)의 활성화 및 카메라부(130)의 활성화를 제어할 수 있다. 예를 들면, 회의 스케줄 정보에 첫 번째 좌석으로 지정된 위치에 참석자가 발언하도록 계획되어 있는 경우 상기 얼굴인식 회의 속기 시스템(100)은 상기 첫 번째 좌석에 배치된 마이크부(120)에 포함된 마이크를 활성화하도록 제어하는 한편, 첫 번째 좌석에 착석한 참석자의 얼굴에 대응하는 영상을 수집하도록 카메라부(130)를 제어할 수 있다. 여기서 상기 얼굴인식 회의 속기 시스템(100)은 회의장 전체를 촬영한 전체 영상을 출력하도록 지원하며, 촬영된 이미지에 대한 영상 인식을 통하여 참석자들과 주변 배경을 분리하거나, 참석자들의 얼굴과 주변 배경을 분리하도록 지원할 수 있다. 그리고 회의 진행자 또는 회의 관리자가 특정 참석자 또는 특정 참석자의 얼굴을 입력부(110) 및 표시부(140) 등을 이용하여 지정하는 경우, 상기 얼굴인식 회의 속기 시스템(100)은 해당 참석자의 위치에 배치된 마이크부(120)를 활성화하도록 제어하여 해당 참석자가 발언하는 오디오 신호와 참석자에 관련된 영상 신호를 함께 수집 및 저장할 수 있다. 이를 위하여 상기 얼굴인식 회의 속기 시스템(100)은 전체 영상에서 참석자 또는 참석자의 얼굴과 주변 배경을 분리한 뒤, 참석자 또는 참석자의 얼굴을 표시부(140) 상에서 지정할 수 있도록 해당 오브젝트에 마이크부(120)를 링크시키도록 제어할 수 있다. 그러면 관리자가 특정 오브젝트를 지정하는 경우, 상기 얼굴인식 회의 속기 시스템(100)은 지정된 오브젝트에 대응하는 마이크를 활성화하도록 제어할 수 있다. 이때 상기 얼굴인식 회의 속기 시스템(100)은 해당 오브젝트와 마이크 매칭 등을 위하여 회의 스케줄 정보를 참조할 수 있으며, 또한 영상 인식을 통하여 지정된 오브젝트의 특징 값이 발언과 관련된 일정한 변화 예를 들면 입 모양의 변화를 가지는지를 판단할 수 있다. 이와 같이 본 발명의 얼굴인식 회의 속기 시스템(100)은 회의장 내에서 다양한 참석자들이 회의 스케줄 정보 및 회의 진행 관리자의 지정에 따라 발언을 하는 경우, 해당 발언자를 촬영한 영상 신호와 발언 내용에 대응하는 오디오 신호를 수집하여 저장할 수 있다. 또한 본 발명의 얼굴인식 회의 속기 시스템(100)은 참석자들 중에 특정 참석자가 발언을 하고자 하는 경우, 해당 발언자에게 할당된 마이크로부터 수집된 오디오 신호를 감지하여 자동으로 해당 발언자에 대응하는 영상 신호를 수집할 수 있다. 또한 상기 얼굴인식 회의 속기 시스템(100)은 참석자들의 얼굴을 인식하여 어떠한 참석자가 발언을 하고 있는지를 판단할 수 있으며, 그에 따라 발언하고 있는 참석자에 할당된 마이크를 자동으로 활성화한 후 해당 참석자의 영상 신호 및 오디오 신호를 수집할 수 있다. 이하 상기 얼굴인식 회의 속기 시스템(100)의 각 구성에 대하여 보다 상세히 설명하기로 한다.On the other hand, in the face recognition meeting shorthand system 100 of the present invention having the above-described configuration, when the meeting starts, the activation of the microphone unit 120 and the camera unit 130 according to the meeting schedule information stored in the storage unit 150 Activation can be controlled. For example, when a participant is scheduled to speak at a location designated as the first seat in the meeting schedule information, the face recognition conference shorthand system 100 may use a microphone included in the microphone unit 120 disposed at the first seat. While controlling to activate, the camera unit 130 may be controlled to collect an image corresponding to the face of the attendee seated in the first seat. Here, the face recognition conference shorthand system 100 supports outputting the entire image of the entire conference hall, and separates the participant and the surrounding background through image recognition of the captured image, or separates the face and the surrounding background of the participant. Can be supported. When the conference organizer or the conference manager designates a specific participant or a specific participant's face by using the input unit 110 and the display unit 140, the face recognition conference shorthand system 100 includes a microphone disposed at the participant's location. The controller 120 may be activated to collect and store the audio signal spoken by the attendee and the video signal related to the attendee. To this end, the face recognition conference shorthand system 100 separates the participant or the participant's face from the entire image and the surrounding background, and then assigns the participant or participant's face on the display unit 140 to the microphone unit 120. Can be controlled to link Then, when the administrator designates a specific object, the face recognition conference shorthand system 100 may control to activate a microphone corresponding to the designated object. In this case, the face recognition conference shorthand system 100 may refer to meeting schedule information for matching a corresponding object and a microphone, and may also change a feature value of a specified object related to speech through image recognition, for example, a mouth shape. Determine if there is a change. As described above, the face recognition conference shorthand system 100 of the present invention, when various participants in the conference room speaks according to the meeting schedule information and the designation of the conference manager, the audio signal corresponding to the video signal and the contents of the speaker. The signal can be collected and stored. In addition, the face recognition conference shorthand system 100 of the present invention, when a particular participant wants to speak, the audio signal collected from the microphone assigned to the speaker is automatically detected and the video signal corresponding to the speaker is automatically collected. can do. In addition, the face recognition conference shorthand system 100 may determine which participant is speaking by recognizing the face of the participant, and accordingly automatically activates the microphone assigned to the participant who is speaking, and then displays the image of the participant. Signal and audio signals can be collected. Hereinafter, each configuration of the face recognition conference shorthand system 100 will be described in more detail.

상기 입력부(110)는 사용자 조작에 따른 숫자 또는 문자 정보를 입력받고 각종 기능들을 설정하기 위한 다수의 입력키 및 기능키들을 포함한다. 상기 기능키들은 특정 기능을 수행하도록 설정된 방향키, 사이드 키 및 단축키 등을 포함할 수 있다. 또한 상기 입력부(110)는 사용자 설정 및 카메라부(130)와 마이크부(120)의 기능 제어와 관련하여 입력되는 키 신호를 상기 제어부(160)로 전달한다. 특히, 본 발명의 입력부(110)는 영상 촬영 동작을 제어하기 위한 다양한 입력신호를 생성한다. 예를 들어, 상기 입력부(110)는 카메라부(130)의 셔터 기능에 해당하는 입력신호, 상기 표시부(140)에 표시되는 다양한 영상을 검색, 저장 및 삭제하기 위한 입력신호, 줌인 기능 또는 줌아웃 기능에 해당하는 입력신호, 카메라부(130)가 제공하는 다양한 사진 촬영 옵션 기능에 해당하는 입력신호 등을 생성하여 상기 제어부(160)에 전달한다. 상기 사진 촬영 옵션 기능은 사진 촬영 시 조도의 변화, 해상도의 변화, 필터 삽입, 영상 전처리 제공 기능 등을 포함한다. 상기 입력부(110)는 사용자의 키 선택에 따른 속기를 입력받을 수 있다. 이러한 입력부(110)는 키보드, 키패드, 마우스, 음성 입력 장치 등 다양한 형태로 구현될 수 있다. 또한 상기 표시부(140)가 터치스크린 형태로 제작되는 경우, 표시부(140)는 입력부(110)로서의 기능을 지원할 수 있다.The input unit 110 includes a plurality of input keys and function keys for receiving numeric or character information according to a user's operation and setting various functions. The function keys may include a direction key, a side key, and an accelerator key set for performing a specific function. In addition, the input unit 110 transmits a key signal input in relation to a user setting and a function control of the camera unit 130 and the microphone unit 120 to the control unit 160. In particular, the input unit 110 of the present invention generates various input signals for controlling the image capturing operation. For example, the input unit 110 may include an input signal corresponding to a shutter function of the camera unit 130, an input signal for searching, storing, and deleting various images displayed on the display unit 140, a zoom in function, or a zoom out function. An input signal corresponding to the input signal, an input signal corresponding to various photographing option functions provided by the camera unit 130, and the like are generated and transmitted to the controller 160. The photo shooting option function includes a change in illuminance, a change in resolution, a filter insertion, an image preprocessing providing function, and the like when taking a photo. The input unit 110 may receive a shorthand according to a user's key selection. The input unit 110 may be implemented in various forms such as a keyboard, a keypad, a mouse, and a voice input device. In addition, when the display unit 140 is manufactured in the form of a touch screen, the display unit 140 may support a function as the input unit 110.

상기 마이크부(120)는 회의장 내에 배치되어 참석자들의 발언을 수집할 수 있는 적어도 하나의 마이크를 포함하는 구성이다. 이러한 마이크부(120)에 포함되는 적어도 하나의 마이크들은 USB 마이크가 될 수 있으며, 경우에 따라서 다양한 특성 예를 들면 특정 지향성을 가지는 마이크가 적용될 수 있다. 상기 마이크부(120)에 포함된 마이크들은 상기 제어부(160)의 제어에 따라 활성화 및 비활성화될 수 있다. 예를 들어, 상기 마이크부(120)에 6개의 마이크가 포함되며, 각 마이크들이 참석자들의 발언에 따른 오디오 신호를 수집할 수 있는 특정 위치에 배치되어 있다고 가정하기로 한다. 그러면 상기 6개의 마이크는 각각 일정한 ID를 부여받을 수 있으며, 제어부(160) 제어에 따라 활성화 시점이 다를 수 있다. 즉 제어부(160)는 회의 스케줄 정보에 따라 마이크들이 활성화되는 시점을 순차적으로 변경할 수 있으며, 이에 따라 상기 마이크부(120)에 포함된 각 마이크들은 순차적으로 활성화될 수 있다. 또한 상기 마이크들의 활성화 시점은 상기 입력부(110)에서 입력되는 입력 신호에 의하여 변경될 수 있으며, 또한 특정 마이크는 전술한 바와 같이 상기 카메라부(130)가 수집한 영상 신호에 대하여 영상 인식을 수행하고 그 결과에 따라 활성화될 수 있다.The microphone unit 120 is configured to include at least one microphone disposed in the conference room to collect the comments of the attendees. At least one microphone included in the microphone unit 120 may be a USB microphone, and in some cases, a microphone having various characteristics, for example, a specific directivity, may be applied. The microphones included in the microphone unit 120 may be activated and deactivated under the control of the controller 160. For example, it is assumed that six microphones are included in the microphone unit 120, and each microphone is disposed at a specific position to collect an audio signal according to a participant's speech. Then, the six microphones may be given a predetermined ID, respectively, and the activation time may be different according to the control of the controller 160. That is, the controller 160 may sequentially change the time points at which the microphones are activated according to the meeting schedule information. Accordingly, the microphones included in the microphone 120 may be sequentially activated. In addition, the activation time of the microphones may be changed by an input signal input from the input unit 110, and the specific microphone performs image recognition on the image signal collected by the camera unit 130 as described above. As a result it can be activated.

상기 카메라부(130)는 회의장에 참석한 참석자들 전체 또는 참석자들 중 적어도 어느 한명에 대응하는 영상을 촬영하는 적어도 하나의 카메라를 포함한다. 이러한 카메라부(130)는 소정의 영상을 촬영하여 데이터 신호로 수신하고, 상기 데이터 신호를 제어부(160)에 전달한다. 상기 카메라부(130)가 촬영한 데이터 신호는 상기 마이크부(120)가 수집한 오디오 신호와 함께 사용자 선택에 따라 임시 저장되거나 반영구적으로 저장될 수 있다. 이러한 상기 카메라부(130)는 피사체의 영상을 촬영하여 표시부(140)에 프리뷰 화면으로 전달할 수 있다. 그리고 상기 카메라부(130)는 피사체의 영상 중 특정 영상에 반응하여 피사체 영상을 포커싱할 수 있다. 실질적으로 상기 카메라부(130)는 제어부(160) 제어에 따라 회의장 전체, 회의장의 특정 인물 또는 특정 인물의 얼굴을 촬영하도록 제어 받게 되며, 이 과정에서 수집된 영상 신호를 제어부(160)에 전달한다. 그리고 제어부(160) 제어에 따라 특정 인물 또는 특정 인물의 얼굴을 포커싱하여 촬영할 수 있으며, 포커싱된 영상 또한 제어부(160)에 전달될 수 있다.The camera unit 130 includes at least one camera for capturing an image corresponding to at least one of all the attendees or the attendees. The camera unit 130 captures a predetermined image and receives the data as a data signal, and transmits the data signal to the controller 160. The data signal photographed by the camera unit 130 may be temporarily stored or semi-permanently stored according to a user selection along with the audio signal collected by the microphone unit 120. The camera unit 130 may capture an image of the subject and transmit the captured image to the display unit 140 as a preview screen. The camera unit 130 may focus on the subject image in response to a specific image of the subject image. Substantially, the camera unit 130 is controlled to photograph the entire conference hall, a specific person in the conference hall, or a face of a specific person under the control of the controller 160, and transmits the image signal collected in this process to the controller 160. . According to the control of the controller 160, a specific person or a face of a specific person may be focused and photographed, and the focused image may also be transmitted to the controller 160.

상기 표시부(140)는 회의 스케줄 정보, 카메라부(130)가 수집한 영상 신호, 마이크부(120)가 수집한 오디오 신호에 대응하는 화면 및 다양한 메뉴를 비롯하여 사용자가 입력한 정보 또는 사용자에게 제공하는 정보화면 등을 표시한다. 특히 상기 표시부(140)는 상기 카메라부(130)가 촬영한 영상, 촬영한 영상을 통하여 영상 인식된 결과에 대응하는 발언자 인물 정보, 회의 시간 동안 수집된 발언자들의 정보와 발언자들의 발언 시각 및 발언 내용을 선택하여 확인할 수 있도록 지시하는 아이콘 등을 출력할 수 있다. 그리고 상기 표시부(140)는 상기 카메라부(130)가 현재 투사하고 있는 피사체에 대한 영상을 프리뷰 화면으로 제공할 수 있다. 프리뷰 화면 기능은 카메라부(130)가 실질적으로 영상을 촬영하기 이전에 피사체가 렌즈를 통하여 입사되는 영상을 보여주는 기능이다. 이러한 프리뷰 기능을 기반으로 상기 표시부(140)는 상기 카메라부(130)가 촬영한 영상을 저장하기 이전에 표시할 수 있다. 또한 상기 표시부(140)는 사용자의 영상에 대한 저장 및 삭제에 대한 확인 메시지를 표시할 수 있다. 이 과정에서 상기 표시부(140)는 카메라부(130)가 촬영한 발언자에 대한 영상 신호, 촬영한 발언자를 영상 인식할 결과에 대응하는 정보 등을 출력할 수 있으며, 해당 정보들을 리스트 형태로 출력할 수 도 있다. 그리고 상기 표시부(140)는 상기 마이크부(120)가 수집한 오디오 신호에 대응하는 음성 파형을 출력할 수 도 있다. 이러한 표시부(140)의 화면 인터페이스에 대하여 도면을 참조하여 보다 상세히 설명하기로 한다.The display unit 140 may provide information input by a user or a menu including meeting schedule information, a video signal collected by the camera unit 130, a screen corresponding to the audio signal collected by the microphone unit 120, and various menus. Display information screens. In particular, the display unit 140 includes an image captured by the camera unit 130, person information corresponding to the result of the image recognition through the captured image, information of the speakers collected during the meeting time, and a speaking time and contents of the speakers. You can output an icon indicating to select and confirm. The display unit 140 may provide an image of a subject currently projected by the camera unit 130 as a preview screen. The preview screen function is a function of displaying an image of a subject incident through a lens before the camera unit 130 actually captures an image. Based on the preview function, the display unit 140 may display the image captured by the camera unit 130 before storing the image. In addition, the display unit 140 may display a confirmation message for storing and deleting the user's image. In this process, the display unit 140 may output an image signal of the speaker photographed by the camera unit 130, information corresponding to a result of recognizing the photographed speaker, and output the corresponding information in a list form. Can also be. In addition, the display unit 140 may output a voice waveform corresponding to the audio signal collected by the microphone unit 120. The screen interface of the display unit 140 will be described in more detail with reference to the accompanying drawings.

상기 저장부(150)는 본 발명의 실시 예에 따른 기능 동작에 필요한 응용 프로그램을 비롯하여, 영상 촬영 시 획득된 영상을 임시 또는 반영구적으로 저장하며, 촬영된 영상에 대하여 저장/삭제 이전에 임시로 저장하는 버퍼링 기능 및 다수의 연속 영상 중 특정 영상을 선택하는 사진 촬영 모드에서 연속 사진에 대한 임시 저장 기능을 담당한다. 이러한 상기 저장부(150)는 크게 프로그램 영역과 데이터 영역을 포함할 수 있다.The storage unit 150 temporarily or semi-permanently stores an image acquired when capturing an image, including an application program required for a function operation according to an embodiment of the present invention, and temporarily stores the captured image before storing / deleting the image. It is responsible for the temporary storage function for the continuous picture in the buffering function and the photographing mode in which a specific image is selected from among a plurality of continuous images. The storage unit 150 may largely include a program area and a data area.

상기 프로그램 영역은 얼굴인식 회의 속기 시스템(100)을 부팅시키는 운영체제(OS, Operating System) 등을 저장한다. 그리고 상기 프로그램 영역은 상기 얼굴인식 회의 속기 시스템(100)이 기타 옵션(options) 기능 예컨대, 카메라 기능, 소리 재생 기능, 이미지 또는 동영상 재생 기능 등을 지원하는 경우, 해당 기능 지원에 필요한 응용 프로그램 등을 저장한다. 특히 본 발명의 프로그램 영역은 회의 속기를 위하여 회의 기록 지원 응용 프로그램(회의 기록 지원.app)을 포함한다.The program area stores an operating system (OS) for booting the face recognition conference shorthand system 100. The program area may include an application required to support the function if the face recognition conference shorthand system 100 supports other options such as a camera function, a sound playback function, an image or video playback function, and the like. Save it. In particular, the program area of the present invention includes a meeting recording support application (meeting recording support.app) for shorthand meetings.

상기 회의 기록 지원 응용 프로그램(회의 기록 지원.app)은 기 설정된 회의 스케줄 정보 및 사용자 입력 신호에 따라 마이크부(120)에 포함된 적어도 하나의 마이크들의 활성화 시점을 제어하고, 카메라부(130)에 포함된 적어도 하나의 카메라의 활성화 및 카메라의 포커싱을 제어하며, 상기 마이크부(120)와 카메라부(130)를 기반으로 수집된 영상 신호와 오디오 신호를 저장하거나 표시부(140)에 출력하도록 제어한다. 이를 위하여 상기 회의 기록 지원 응용 프로그램(회의 기록 지원.app)은 회의 스케줄 정보를 확인하는 루틴, 상기 회의 스케줄 정보 및 사용자 입력에 따라 상기 마이크부(120)를 제어하는 루틴, 상기 회의 스케줄 정보 및 사용자 입력에 따라 카메라부(130)의 영상 촬영을 제어하는 루틴, 촬영된 영상을 영상 인식하기 위한 영상 인식 알고리즘을 활성화하는 루틴, 영상 인식된 데이터를 바탕으로 촬영된 영상의 인물 정보를 검색하는 루틴, 상기 마이크부(120)가 수집한 오디오 신호와 상기 카메라부(130)가 수집한 영상 신호 및 상기 인물 정보 중 적어도 하나를 통합하여 저장하도록 제어하는 루틴, 상기 오디오 신호와 영상 신호 및 인물 정보 중 적어도 하나를 표시부(140)에 출력하는 루틴을 포함할 수 있다. The meeting recording support application (meeting recording support.app) controls the activation time of at least one microphone included in the microphone unit 120 according to preset meeting schedule information and a user input signal, and controls the camera unit 130. Controls activation of at least one camera included and focusing of the camera, and controls to store or output the image and audio signals collected based on the microphone unit 120 and the camera unit 130. . To this end, the meeting recording support application (meeting recording support.app) includes a routine for checking meeting schedule information, a routine for controlling the microphone unit 120 according to the meeting schedule information, and a user input, the meeting schedule information, and a user. A routine for controlling image capturing by the camera unit 130 according to an input, a routine for activating an image recognition algorithm for recognizing a captured image, a routine for retrieving person information of a captured image based on the recognized image data, A routine for controlling to integrate and store at least one of an audio signal collected by the microphone unit 120, an image signal collected by the camera unit 130, and the person information, and at least one of the audio signal, image signal, and person information It may include a routine for outputting one to the display unit 140.

상기 영상 인식 알고리즘은 피사체의 영상을 인식할 수 있는 알고리즘으로서, 피사체의 모양, 동작 상태 등을 기 저장된 영상 샘플과 비교하여 어떠한 영상인지를 판별하는 알고리즘이다. 예를 들어, 상기 피사체 영상이 인물의 얼굴에 관련된 영상일 경우, 상기 영상 인식 알고리즘은 얼굴에서 눈의 위치, 코의 위치 및 잎의 위치를 추출하고, 상기 각 위치 간의 거리의 변화 및 위치의 변화를 기반으로 현재 인물의 얼굴 상태가 어떠한 표정을 근거로 하여 형성되었는지를 판단할 수 있다. The image recognition algorithm is an algorithm for recognizing an image of a subject, and is an algorithm for determining which image is compared by comparing a shape, an operation state, and the like of the subject with previously stored image samples. For example, when the subject image is an image related to the face of the person, the image recognition algorithm extracts the position of the eye, the position of the nose and the position of the leaf from the face, and the change of the distance and the change of position between the respective positions. Based on the expression, the face state of the current person may be determined based on the expression.

특히, 본 발명의 영상 인식 알고리즘은 회의 중 발언자를 녹화하는 시스템에서 적어도 하나의 카메라를 사용할 때, 발언자가 위치한 곳의 카메라를 찾아서 해당 카메라를 기반으로 수집된 얼굴을 인식하게 된다. 이때 상기 회의 기록 지원 응용 프로그램은 마이크부(120)를 통한 위치 확인을 수행하고, 거리에 따라 음성 신호 크기가 줄어드는 원리 이용하여 어떠한 카메라를 활성화할지 결정할 수 도 있다.In particular, the image recognition algorithm of the present invention, when using at least one camera in a system for recording a speaker during a meeting, finds a camera where the speaker is located and recognizes a face collected based on the camera. In this case, the conference recording support application may determine the position of the camera by using the principle of performing the location check through the microphone unit 120 and reducing the size of the voice signal according to the distance.

한편, 상기 영상 인식 알고리즘 수행을 위하여 상기 회의 기록 지원 응용 프로그램은 얼굴 영역 추출 과정과 정규화 과정을 수행할 수 있다. 이를 보다 상세히 설명하면, 상기 회의 기록 지원 응용 프로그램은 발언자가 있을 것으로 추정된 위치의 카메라로부터 얻어진 영상에서 얼굴 영역을 추출하기 위해 먼저 영상 전체에 대한 히스토그램 평활화를 수행한다. 그리고 상기 회의 기록 지원 응용 프로그램은 Haar-like 특징 값을 이용하여 영상에 존재하는 다수의 얼굴 영역을 검출한다. 얼굴 영역은 얼굴 색 정보에 기반하여 주변의 배경을 삭제한 후 얼굴 객체만 추출하여 정규화 작업을 수행한다. 정규화는 얼굴 객체를 타원이라고 가정한 후 무게 중심과 장축을 구하여 이를 기준으로 얼굴이 기울어지지 않도록 보정하며 양선형 보간법을 이용하여 다양한 크기의 얼굴 객체를 일정한 크기로 확대/축소한다. 얼굴 객체 정규화 과정은 얼굴 영상 객체, 얼굴 영역의 무게 중심 구하기, 영역 좌표 값 이동, 공분산 배열 구하기, 아이겐 벨류 구하기, 아이겐 벡터 구하기, 기울기(각도) 구하기, 얼굴 영역 회전, 얼굴 영역 확대/축소, 정규화 된 얼굴 영상 객체 획득 과정을 포함할 수 있다.In order to perform the image recognition algorithm, the conference recording support application may perform a face region extraction process and a normalization process. In more detail, the conference recording support application first performs histogram smoothing of the entire image to extract the face region from the image obtained from the camera at the location where the speaker is estimated to be present. In addition, the conference recording support application detects a plurality of face regions existing in the image by using a Haar-like feature value. The face area is normalized by extracting only the face object after deleting the surrounding background based on the face color information. Normalization assumes that the face object is an ellipse, and then obtains the center of gravity and the long axis to correct the face from tilting. Based on the bilinear interpolation, the face objects of various sizes are enlarged / reduced to a certain size. The face object normalization process consists of a face image object, finding the center of gravity of the face region, moving the coordinates of the region, finding the covariance array, finding the Eigen values, finding the Eigen vector, finding the slope (angle), rotating the face region, zooming in and out of the face region. It may include the process of acquiring the face image object.

그리고 상기 회의 기록 지원 응용 프로그램은 얼굴 인식 과정에서 정규화 된 얼굴 객체의 특징 값을 이용하여 신원을 확인하는 작업을 수행한다. 이때 상기 회의 기록 지원 응용 프로그램은 양쪽 눈의 중점과 코의 중점, 윗입술의 위치를 특징 값으로 사용할 수 있다. 그리고 상기 회의 기록 지원 응용 프로그램은 얼굴 비교 과정에서 유클리디안 거리를 이용하여 최소 거리일수록 유사한 얼굴이라고 판단할 수 있으며, 특징 값은 얼굴 비교 좌표계에 사상하여 사용하는데 이러한 특징 값 적용을 통하여 이미지의 크기 변경이나 노이즈로 인한 얼굴 영역의 왜곡, 자유로운 자세에 따른 얼굴의 상하, 좌우 회전을 반영할 수 있다. 그리고 상기 회의 기록 지원 응용 프로그램은 특징 값 사이의 비율을 얼굴 비교 좌표로 사용할 수 있다.The conference recording application application verifies the identity by using feature values of the normalized face object during face recognition. In this case, the conference recording support application may use the midpoint of both eyes, the midpoint of the nose, and the position of the upper lip as feature values. In addition, the conference recording support application may determine that the minimum distance is a similar face using the Euclidean distance in the face comparison process, and the feature value is used in the face comparison coordinate system by applying the feature value. Distortion of the face area due to change or noise, and up, down, left and right rotation of the face according to free posture can be reflected. The conference recording support application may use a ratio between feature values as face comparison coordinates.

예를 들어, 2차원 좌표에서 코의 X축 위치를 Xnose, 왼쪽 눈의 X축 위치를 Xleye, 오른쪽 눈의 X축 위치를 Xreye, 코의 Y축 위치를 Ynose, 눈의 Y축 위치를 Yeye, 입의 Y축 위치를 Ymouse라고 가정하면, dx = |Xnose - Xleye|/|Xnose-Xreye|, dy = |Ynose-Yeye|/|Ynose-Ymouse|라고 정의할 수 있다.For example, in two-dimensional coordinates, the nose X-axis position Xnose, the left eye X-axis position Xleye, the right eye X-axis position Xreye, the nose Y-axis position Ynose, the eye Y-axis position Yeye, Assuming the Y-axis position of the mouth is Ymouse, we can define dx = | Xnose-Xleye | / | Xnose-Xreye |, dy = | Ynose-Yeye | / | Ynose-Ymouse |.

그러면, 상기 회의 기록 지원 응용 프로그램은 미리 수집된 샘플 영상 즉 얼굴 템플릿(dx_t,dy_t)과 실제 회의 기록 중에 얻어진 얼굴 특징 값(dx, dy)의 거리를 다음 수학식 1을 통하여 연산할 수 있다. Then, the conference recording support application may calculate the distance between the sample image collected in advance, that is, the face template (dx _t , dy _t ) and the facial feature values (dx, dy) obtained during the actual conference recording, through Equation 1 below. have.

좌표 값을 위한 특징 값은 좌우 회전의 경우 상하 회전에 비해 변화가 크기 때문에 특징 값 추출 시 얼굴의 좌우 회전에 대한 고려가 필요하며, 이에 따라 상기 회의 기록 지원 응용 프로그램은 정규화된 얼굴 영역 객체에 Canny 기법을 사용하여 에지를 추출하고 이를 기반으로 눈썹, 눈, 코, 입의 수평 특징 선을 찾아낸다. 이때 상기 회의 기록 지원 응용 프로그램은 얼굴의 좌우 회전을 고려하여 얼굴 영역을 좌측, 우측의 두 부분으로 분리하여 에지를 수집하고, 중간 영역에 대해 중복 수행하여 수집한 에지를 통합하고 대표 값을 선택한다.Since the feature value for the coordinate value is larger in the case of the left and right rotation than the up and down rotation, it is necessary to consider the left and right rotation of the face when extracting the feature value. Techniques are used to extract the edges and use them to find the horizontal feature lines of the eyebrows, eyes, nose, and mouth. In this case, the conference recording application application divides the face area into two parts of the left and right sides in consideration of the left and right rotation of the face, collects the edges, integrates the collected edges by overlapping the middle area, and selects a representative value. .

상기 회의 기록 지원 응용 프로그램은 얼굴 좌우 회전에 대한 판단은 코의 대표 값을 이용한다. 상기 회의 기록 지원 응용 프로그램은 정면을 향하고 있는 얼굴 영상의 경우 코의 위치가 전체 얼굴 영역 중 가로의 중점에 위치한다고 가정하고, 코 대표 값이 좌우로 치우침에 따라 얼굴이 회전하였다고 판단하여 이에 대한 보정 작업을 한다. 상기 회의 기록 지원 응용 프로그램은 한쪽으로 치우쳐진 코의 대표 값을 중앙으로 이동시키기 위한 각도를 계산하고, 이 값을 이용하여 양쪽 눈의 대표 값을 변환한다. 이는 얼굴상에서 코와 눈의 특징 점이 동일한 원주에 있다는 가정으로 수행될 수 있다. 얼굴 인식 과정은 이와 같은 오차를 보정하기 위해 4프레임 이상의 영상에서 얻어진 특징 값의 평균을 사용하여 얼굴 비교 좌표를 구하고 거리를 계산하는 것이 바람직하다.The meeting recording support application uses a representative value of the nose to determine the face left and right rotation. The meeting recording support application assumes that the position of the nose is located at the horizontal midpoint of all face regions in the face image facing front, and determines that the face is rotated as the nose representative value is shifted from side to side. Do the work. The conference recording support application calculates an angle for shifting the representative value of the nose biased to one side and converts the representative value of both eyes using this value. This can be done on the assumption that the feature points of the nose and eyes are on the same circumference on the face. In the face recognition process, in order to correct such an error, it is preferable to obtain a face comparison coordinate and calculate a distance by using an average of feature values obtained from an image of 4 frames or more.

한편, 상기 회의 기록 지원 응용 프로그램은 다수의 템플릿 좌표와 얼굴 특징 값 좌표의 거리가 동일하게 측정된 경우 얼굴 특징 값 좌표를 기준으로 각 사분면에 존재하는 템플릿을 3-2-4-1 사분면 순으로 선택할 수 있다. 이는 좌우 회전에 비해 상하 회전의 오류 비율이 낮고 특징 값 간의 차가 적은 것을 우선 선택하기 위한 것이다.Meanwhile, when the distance between the plurality of template coordinates and the facial feature value coordinates is measured equally, the conference recording support application displays the templates existing in each quadrant in the order of 3-2-4-1 quadrants based on the facial feature value coordinates. You can choose. This is to first select that the error rate of the vertical rotation is lower than the left and right rotation and the difference between the feature values is small.

상기 회의 기록 지원 응용 프로그램은 상술한 바와 같은 얼굴 인식 과정 후 발언자 인식을 수행하며, 얼굴 인식 과정에서 얻어진 특징 값과 얼굴 객체를 사용한다. 이때 발언자 인식 문제의 특성상 시간적으로 연속적인 여러 개의 프레임이 필요하며 판단에 사용하는 초당 프레임의 개수가 일정량 이상 보장되는 것이 바람직하다. The conference recording application application performs speaker recognition after the face recognition process as described above, and uses the feature value and the face object obtained in the face recognition process. At this time, due to the nature of the speaker recognition problem, several frames that are continuous in time are required, and it is preferable that the number of frames per second used for the determination is guaranteed to be a certain amount or more.

인식 과정은 먼저 코 아래 입술 부위의 에지 산포도의 분산을 구하고, 변화율을 측정한다. 측정된 변화율 값이 일정 시간 동안 가장 큰 얼굴 객체를 발언자라고 인식하며, 연속된 프레임 상에서 입주위의 변화도가 너무 적은 얼굴 객체는 발언자가 아니라고 판단하여 발언자인식 대상에서 제외할 수 있다.The recognition process first calculates the variance of the edge scatter plot of the lip area under the nose and measures the rate of change. The face object with the measured change rate value for the predetermined time is recognized as the speaker, and the face object having a small change in the circumference of the mouth on a continuous frame may be determined as not the speaker and may be excluded from the speaker recognition object.

변화율을 구하는 과정은 수학식 2와 같이 먼저 입주위의 에지 산포도를 가로선을 기준으로 각 선과 에지들이 만나는 점의 개수의 합으로 결정한다.The process of calculating the rate of change is first determined as the sum of the number of points where each line meets the edges, based on the horizontal line, based on the edge scatter map around the entrance.

여기서, n은 입 영역의 길이, i는 각 가로선의 인덱스, f_i는 i번째 가로선이 에지들과 만나는 점들의 개수, m은 평균값이다. 한편 상기 평균값 m은 다음 수학식 3에서와 같은 값이 될 수 있다.Where n is the length of the mouth region, i is the index of each horizontal line, f _i is the number of points where the i-th horizontal line meets the edges, and m is an average value. Meanwhile, the average value m may be the same value as in Equation 3 below.

산포도 변화율은 다음 수학식 4를 통해 얻어질 수 있으며, T 값을 기준으로 발언자 인식 대상에 포함 여부를 판단할 수 있다.The scatter change rate may be obtained through Equation 4 below, and may be included in the speaker recognition object based on the T value.

상기 회의 기록 지원 응용 프로그램은 발언자가 인식되면 영상에서 발언자를 정 중에 위치시키기 위해 카메라부(130)의 팬, 틸트, 줌 모듈을 사용하여 위치를 조정하도록 제어할 수 있다.When the speaker is recognized, the conference recording support application may control to adjust the position by using the pan, tilt, and zoom modules of the camera unit 130 to position the speaker in the image.

한편, 상기 데이터 영역은 상기 얼굴인식 회의 속기 시스템(100)의 동작에 따라 생성된 회의 기록 데이터를 저장한다. 즉, 상기 데이터 영역은 마이크부(120)가 수집한 오디오 신호, 카메라부(130)가 수집한 영상 신호를 포함하는 회의 기록을 사용자 조작 또는 자동으로 저장할 수 있다. 또한 상기 데이터 영역은 상기 영상 인식을 통하여 발언자를 확인하기 위하여 발언자에 대응한 샘플 정보와, 발언자의 인물 정보를 저장할 수 있다. 상기 샘플 정보는 영상 인식을 위하여 제공되며, 상기 발언자의 인물 정보는 표시부(140)에 출력되거나, 상기 오디오 신호 및 영상 신호와 함께 별도로 저장될 수 있다. 상기 샘플 정보는 보다 높은 정확도를 가지는 영상 인식을 위하여 참석자들의 얼굴 전면, 우측면, 좌측면 등 다양한 측면에서의 영상 특징 값 정보를 포함할 수 있다. 또한 회의 기록 데이터는 사용자가 실시간으로 녹화 중인 회의 영상을 보면서, 또는 녹화된 회의 영상을 재생하면서 입력부(110)를 통하여 입력한 속기 정보를 저장할 수 있다.The data area stores conference recording data generated according to the operation of the face recognition conference shorthand system 100. That is, the data area may automatically store a conference record including an audio signal collected by the microphone unit 120 and an image signal collected by the camera unit 130. The data area may store sample information corresponding to a speaker and person information of a speaker to identify a speaker through the image recognition. The sample information is provided for image recognition, and the person information of the speaker may be output to the display unit 140 or separately stored together with the audio signal and the image signal. The sample information may include image feature value information from various aspects such as the front face, the right side, and the left side of the participant for image recognition with higher accuracy. In addition, the conference recording data may store shorthand information input through the input unit 110 while the user views the conference video being recorded in real time or reproduces the recorded conference video.

상기 제어부(160)는 상기 얼굴인식 회의 속기 시스템(100)의 전반적인 동작 및 상기 얼굴인식 회의 속기 시스템(100)의 내부 블록들 간 신호 흐름을 제어한다. 특히, 본 발명의 제어부(160)는 회의 진행 과정에서 필요한 스케줄 정보 확인과, 발언자의 오디오 신호 및 영상 신호의 수집, 영상 인식을 통하여 데이터 생성과, 저장 및 출력을 제어할 수 있다. 이 과정에서 상기 제어부(160)는 카메라부(130)를 제어하여 여러 명의 발언자가 나오는 전체 영상에서 발언자의 얼굴을 개별 인식하여 발언자의 얼굴에 포커싱을 한다. 그리고 회의가 시작된 후, 카메라부(130)에 의하여 포커싱된 각 발언자의 얼굴이 출력된 표시부(140) 상에서 사용자가 입력부(110)를 이용하여 특정 얼굴 영역을 클릭하거나 기 설정된 회의 스케줄 정보에 따라 자동으로 개별 발언자가 선택되는 경우, 선택된 개별 발언자의 영상과 음성이 별도로 분리되어 표시되고 전체 영상 중 해당 발언자의 타임라인과 음성파형을 표시부(140)에 표시하도록 제어할 수 있다. 이때, 발언자 영상 분리 표시는 사용자가 입력부(110)를 이용하여 수동으로 하거나 자동으로 구분할 수 있다. 자동으로 구분되는 상기 제어부(160)는 경우 회의 스케줄 정보에 따라 발언하는 발언자의 영상과 발언자의 오디오 신호를 분석한 음성 파형을 표시부(140)에 자동으로 출력하도록 제어한다. 이때 상기 제어부(160)는 현재 발언자의 영상이 분리되어 표시된 후, 해당 발언자의 발언이 끝나면, 표시부(140)에서 발언자에 대응하는 영상 신호와 오디오 신호에 해당하는 음성 파형을 제거하고, 회의 스케줄 정보에 따라 다음 발언자의 영상을 수집하여 표시부(140)에 출력하도록 제어한다. 상기 제어부(160)는 실시간으로 녹화 및 재생되는 영상에서 특정 발언자의 얼굴을 선택하거나, 녹화 완료된 전체 영상 중 특정 발언자의 얼굴을 선택하는 경우, 선택된 발언자의 음성과 영상만을 표시부(140) 및 오디오 처리부를 통하여 출력하도록 제어할 수 있다. 상기 제어부(160)의 세부 구성에 대하여 도 3을 참조하여 보다 상세히 설명하기로 한다.The controller 160 controls the overall operation of the face recognition meeting shorthand system 100 and the signal flow between internal blocks of the face recognition meeting shorthand system 100. In particular, the controller 160 of the present invention can control data generation, storage, and output by checking schedule information necessary during a conference, collecting audio signals and video signals of a speaker, and recognizing images. In this process, the controller 160 controls the camera unit 130 to individually recognize the speaker's face in the entire image from which several speakers are released and focus on the speaker's face. After the meeting starts, the user clicks on a specific face area using the input unit 110 on the display unit 140 on which the face of each speaker focused by the camera unit 130 is output, or automatically according to preset meeting schedule information. When an individual speaker is selected, the video and audio of the selected individual speaker may be separately displayed and the timeline and the audio waveform of the speaker may be displayed on the display unit 140 of the entire image. In this case, the speaker image separation display may be manually or automatically classified by the user using the input unit 110. The control unit 160 which is automatically classified controls to automatically output to the display unit 140 a voice waveform obtained by analyzing an image of a speaker and an audio signal of a speaker according to the meeting schedule information. At this time, the controller 160 separates and displays the current speaker's image, and when the speaker finishes speaking, the display unit 140 removes the audio waveform corresponding to the video signal and the audio signal corresponding to the speaker, and the conference schedule information. According to the control, the image of the next speaker is collected and output to the display unit 140. When selecting a face of a specific speaker in an image recorded and reproduced in real time, or selecting a face of a specific speaker from among all the recorded images, the controller 160 displays only the voice and the image of the selected speaker. Can be controlled to output through. A detailed configuration of the controller 160 will be described in more detail with reference to FIG. 3.

한편 상기 제어부(160)는 발언자의 수가 증가할 때마다 발언자의 얼굴을 각각 인식하여 분리된 발언자의 영상과 음성파형을 저장 및 출력하도록 제어하고, 이 과정에서 현재 발언하는 발언자의 영상과 음성파형의 출력을 기 저장된 다른 발언자의 영상 및 음성 파형의 출력과 구분되도록 배경색을 다르게 하거나 하이라이트 표시를 하는 등의 동작을 제어할 수 있다. 여기서 상기 발언자와 관련된 영상에는 발언자의 영상과 발언자 명, 발언시작 시간, 발언종료 시간 등이 포함될 수 있다. On the other hand, the controller 160 recognizes the speaker's face each time the number of speakers increases, and controls to store and output the separated speaker's video and audio waveforms. It is possible to control operations such as changing the background color or highlighting so that the output is distinguished from the output of video and audio waveforms of other speakers. Here, the image related to the speaker may include an image of the speaker, a speaker name, a speech start time, a speech end time, and the like.

상술한 바와 같이 본 발명의 실시 예에 따른 얼굴인식 회의 속기 시스템(100)은 상기 카메라부(130)에 적용되는 카메라를 웹카메라를 적용하고, 마이크부(120)에 적용되는 마이크를 USB 마이크를 적용할 경우 비교적 저가의 설치비를 기반으로 구현이 가능하다. 또한 얼굴인식 회의 속기 시스템(100)의 각 모듈은 독립적으로 구성하여 장비의 변경이나 환경의 변화에 능동적으로 대처할 수 있도록 할 수 있다. 또한 본 발명의 얼굴인식 회의 속기 시스템(100)은 영상의 기록과정에서 얼굴 인식 정보와 발언자 정보를 기반으로 하여 회의 기록을 보다 사용자 중심으로 할 수 있도록 구현할 수 있다. 그리고 본 발명의 얼굴인식 회의 속기 시스템(100)은 얼굴 인식에서 사용되는 특징 값을 상술한 영상 인식 알고리즘 설명에서와 같이 얼굴의 회전에 강인하도록 추출하여 자유로운 상황의 회의에서 다양한 자세를 가지는 참석자를 인식하도록 지원할 수 있다. 이러한 본 발명의 얼굴인식 회의 속기 시스템(100)은 발언자 위치 분석, 얼굴 영역 추출 및 정규화, 얼굴 인식, 발언자 인식 과정을 통하여 발언자를 인식하고 해당 발언자에 대응하는 영상 신호 및 발언자가 발언하는 오디오 신호를 수집할 수 있다.As described above, the face recognition conference shorthand system 100 according to an exemplary embodiment of the present invention applies a web camera to a camera applied to the camera unit 130, and uses a USB microphone as a microphone applied to the microphone unit 120. If applied, it can be implemented based on relatively low installation cost. In addition, each module of the face recognition conference shorthand system 100 may be independently configured to actively cope with changes in equipment or changes in the environment. In addition, the face recognition conference shorthand system 100 of the present invention may be implemented to make the conference recording more user-oriented based on face recognition information and speaker information in the recording process of the image. In addition, the face recognition conference shorthand system 100 of the present invention extracts feature values used in face recognition to be robust to face rotation as described in the above-described image recognition algorithm, and recognizes participants having various postures in a meeting of free situation. Can be supported. The face recognition conference shorthand system 100 of the present invention recognizes a speaker through a speaker position analysis, facial region extraction and normalization, face recognition, and speaker recognition process, and recognizes an image signal corresponding to the speaker and an audio signal spoken by the speaker. Can be collected.

도 3은 본 발명의 실시 예에 따른 얼굴인식 회의 속기 시스템(100)의 제어부(160) 구성을 보다 상세히 나타낸 도면이다.3 is a view showing in more detail the configuration of the controller 160 of the face recognition conference shorthand system 100 according to an embodiment of the present invention.

상기 도 3을 참조하면, 본 발명의 제어부(160)는 회의 스케줄링부(161), 오디오 수집부(163), 영상 인식 및 수집부(165), 회의 기록 저장 및 출력부(167)를 포함한다.Referring to FIG. 3, the controller 160 of the present invention includes a conference scheduling unit 161, an audio collecting unit 163, an image recognition and collection unit 165, and a conference record storage and output unit 167. .

상기 회의 스케줄링부(161)는 회의 진행에 필요한 영상 및 오디오 신호의 수집과 저장 및 출력 등의 회의 스케줄링을 관리하는 구성이다. 이러한 회의 스케줄링부(161)는 상기 저장부(150)에 저장된 회의 스케줄 정보를 확인하거나, 입력부(110)를 통하여 사용자가 입력하는 입력 신호를 분석하여 발언자의 위치를 분석하거나, 오디오 신호가 수집되는 마이크의 위치를 확인하거나, 카메라부(130)가 수집한 전체 영상에서 참석자들의 얼굴 영상 인식을 수행함으로써 특정 발언자의 위치를 결정하고, 그에 따라 마이크부(120) 및 카메라부(130)를 이용하여 해당 발언자의 영상을 수집하도록 제어함과 아울러 발언에 해당하는 오디오 신호를 수집하도록 제어하는 구성이다. 그리고 상기 회의 스케줄링부(161)는 카메라부(130)가 수집한 발언자의 영상을 인식하여 발언자의 인물정보를 확인하고, 저장부(150)에 기 저장된 참석자 인물 정보를 확인한 후 영상 인식된 발언자에 해당하는 인물 정보를 검색하도록 제어할 수 있다. 이후 상기 회의 스케줄링부(161)는 해당 데이터 즉, 오디오 신호, 영상 신호 및 인물 정보를 상기 회의 기록 저장 및 출력부(167)에 전달할 수 있다. 여기서 상기 회의 스케줄링부(161)는 영상 인식 및 수집부(165)에 요청하여 카메라부(130)를 활성화하도록 제어한 후 회의장 전체 참석자에 대한 영상을 획득하도록 요청할 수 있다. 그리고 상기 회의 스케줄링부(161)는 전체 참석자에 대한 영상이 획득되면, 획득된 전체 영상을 회의 기록 저장 및 출력부(167)에 전달하여 표시부(140)에 출력하도록 제어할 수 있다. 그리고 상기 표시부(140)에 출력된 전체 영상 중 특정 발언자를 지시하는 사용자 입력이 발생하면, 상기 회의 스케줄링부(161)는 해당 발언자에 할당된 마이크를 활성화하여 오디오 신호를 수집하도록 제어하는 한편, 발언자의 영상을 수집하고, 수집된 영상을 인식하여 발언자의 인물 정보를 수집하도록 제어할 수 있다. 상기 회의 스케줄링부(161)는 사용자 입력이 없거나, 별도의 회의 스케줄 정보가 있는 경우, 회의 스케줄 정보에 따라 발언할 발언자의 위치를 판단하고, 해당 위치의 마이크 활성화 및 해당 위치의 발언자 영상을 포커싱하도록 제어할 수 있다. 그리고 해당 마이크가 기 설정된 시간 동안 오디오 신호의 수집이 없거나 발언권의 이양에 대응하는 입력 신호가 발생하는 경우, 상기 회의 스케줄링부(161)는 회의 스케줄 정보에 따라 다음 발언자의 위치에 배치된 마이크를 활성화하거나 발언권이 이양된 위치의 마이크를 활성화하도록 제어할 수 있다. The conference scheduling unit 161 is a component that manages conference scheduling such as collecting, storing, and outputting video and audio signals required for conference progress. The conference scheduling unit 161 checks conference schedule information stored in the storage unit 150, analyzes an input signal input by a user through the input unit 110, or analyzes the location of the speaker, or collects an audio signal. Determine the position of the microphone or determine the position of a specific speaker by performing the face image recognition of the participants in the entire image collected by the camera unit 130, accordingly using the microphone unit 120 and the camera unit 130 It is configured to control to collect an image of the speaker and to collect an audio signal corresponding to the speech. The conference scheduling unit 161 recognizes the speaker's person information by recognizing the speaker's image collected by the camera unit 130, checks the person's person information previously stored in the storage unit 150, and then checks the speaker's person's information. It may be controlled to search for the corresponding person information. Thereafter, the conference scheduling unit 161 may transmit corresponding data, that is, audio signal, video signal, and person information, to the conference record storage and output unit 167. In this case, the conference scheduling unit 161 may request the image recognition and collection unit 165 to activate the camera unit 130 and then request to obtain an image of all participants in the conference hall. When the image of all participants is obtained, the conference scheduling unit 161 may transfer the entire image to the display unit 140 by transferring the obtained image to the conference record storage and output unit 167. When a user input indicating a specific speaker is displayed among all images output to the display unit 140, the conference scheduling unit 161 controls to collect an audio signal by activating a microphone assigned to the corresponding speaker, and the speaker. Collect the image of the user, and recognize the collected image can be controlled to collect the speaker's person information. When there is no user input or separate conference schedule information, the conference scheduling unit 161 determines the location of the speaker to be spoken according to the conference schedule information, and activates the microphone of the corresponding location and focuses the speaker image of the location. Can be controlled. When the microphone does not collect an audio signal for a preset time or an input signal corresponding to the transfer of the voice is generated, the conference scheduling unit 161 activates the microphone located at the location of the next speaker according to the conference schedule information. Or to activate the microphone in the transferred position.

상기 오디오 수집부(163)는 상기 회의 스케줄링부(161)의 제어에 따라 마이크부(120)를 제어하고, 활성화된 마이크를 이용하여 해당 마이크가 배치된 발언자의 발언을 수집하도록 제어하는 구성이다. 이러한 오디오 수집부(163)는 마이크부(120)에 포함된 적어도 하나의 각 마이크들에 대한 ID 정보와 위치 정보를 관리하여, 어떠한 마이크를 활성화할 것인지를 결정하고, 그에 따른 마이크 활성화 및 오디오 신호 수집을 제어하게 된다. 이때 상기 ID 정보와 위치 정보는 회의 스케줄 정보에 포함될 수 있으며, 이에 따라 상기 오디오 수집부(163)는 저장부(150)에 저장된 회의 스케줄 정보에 따라 마이크 활성화를 결정할 수 있다. 또한 상기 오디오 수집부(163)는 사용자가 특정 위치에 배치된 마이크를 활성화하기 위한 입력 신호를 표시부(140)에 출력된 전체 영상에서 특정 발언자를 지시하는 방식으로 생성하는 경우, 해당 지시에 따른 입력 신호를 기반으로 해당 위치에 할당된 마이크를 활성화하도록 제어할 수 있다. 이를 위하여 상기 회의 스케줄링부(161)는 상기 전체 영상 수집 및 표시부(140) 출력 시, 표시부(140)에 배치된 전체 영상에서의 발언자들의 위치와 마이크들의 배치 위치를 매핑하도록 제어하고, 매핑 정보를 오디오 수집부(163)에 전달할 수 있다.The audio collection unit 163 controls the microphone unit 120 according to the control of the conference scheduling unit 161 and controls to collect the speech of the speaker in which the microphone is arranged by using the activated microphone. The audio collector 163 manages ID information and location information of at least one of the microphones included in the microphone 120 to determine which microphone to activate, and accordingly microphone activation and audio signal. You will control the collection. In this case, the ID information and the location information may be included in the conference schedule information. Accordingly, the audio collector 163 may determine the microphone activation according to the conference schedule information stored in the storage unit 150. In addition, when the user generates an input signal for activating a microphone disposed at a specific position in a manner of instructing a specific speaker in the entire image output to the display unit 140, the audio collector 163 inputs the corresponding signal. You can control to activate the microphone assigned to the location based on the signal. To this end, the conference scheduling unit 161 controls to map the positions of the speakers and the arrangement positions of the microphones in the entire image disposed on the display unit 140 when the entire image collection and display unit 140 is output, and maps the mapping information. The audio collecting unit 163 may transmit the audio collection unit 163.

상기 영상 인식 및 수집부(165)는 상기 회의 스케줄링부(161)의 제어에 따라 카메라부(130)를 제어하고, 활성화된 카메라를 이용하여 특정 위치에 배치된 발언자의 영상을 수집하도록 제어하는 구성이다. 그리고 영상 인식 및 수집부(165)는 수집된 영상에 대하여 영상 인식을 수행하고 저장부(150)에 저장된 샘플 정보를 이용하여 발언자가 어떠한 인물인지를 판단할 수 있다. 이를 위하여 상기 회의 스케줄링부(161)는 회의 참석자들에 대한 인물 정보를 사전에 획득할 수 있으며, 각 참석자들의 얼굴 인식 정보를 미리 획득하여 회의 진행 중 인식된 얼굴에 대한 인물 정보를 검색할 수 있도록 지원할 수 있다. The image recognition and collection unit 165 controls the camera unit 130 according to the control of the conference scheduling unit 161 and controls to collect an image of a speaker disposed at a specific location using an activated camera. to be. In addition, the image recognition and collection unit 165 may perform image recognition on the collected image and determine who is the speaker using the sample information stored in the storage unit 150. To this end, the meeting scheduling unit 161 may acquire person information about the meeting participants in advance, and obtain face recognition information of each attendee in advance so that the person information on the face recognized during the meeting may be retrieved. Can support

상기 회의 기록 저장 및 출력부(167)는 상기 회의 스케줄링부(161) 제어에 따라 영상 인식 및 수집부가 수집한 영상과 마이크가 수집한 오디오 신호, 인물 정보를 전달받고 해당 신호와 정보들을 표시부(140)에 출력하도록 제어하는 구성이다. 그리고 상기 회의 기록 저장 및 출력부(167)는 기 설정된 스케줄에 따라 해당 정보들을 자동으로 저장부(150)에 저장하도록 제어하거나, 사용자 입력에 따라 저장부(150)에 저장하도록 제어할 수 있다. 또는 회의 기록 저장 및 출력부(161)는 사용자 입력 및 기 설정된 회의 스케줄 정보에 따라 발언자의 오디오 신호 및 발언자의 영상을 수집하여 회의 기록으로 저장과 동시에 재생할 수 있다. 상기 회의 기록 저장 및 출력부(167)에 의한 정보 출력에 대하여 도 4를 참조하여 보다 상세히 설명하기로 한다.The conference record storage and output unit 167 receives the image collected by the image recognition and collection unit, the audio signal collected by the microphone, and the person information under the control of the conference scheduling unit 161, and displays the corresponding signal and information. ) To control the output. The conference record storage and output unit 167 may control to automatically store corresponding information in the storage unit 150 according to a preset schedule, or control to store the information in the storage unit 150 according to a user input. Alternatively, the meeting record storage and output unit 161 may collect an audio signal of the speaker and an image of the speaker according to a user input and preset meeting schedule information, and store and reproduce the same as the meeting record. The information output by the conference record storage and output unit 167 will be described in more detail with reference to FIG. 4.

도 4는 본 발명의 실시 예에 따른 얼굴인식 회의 속기 시스템(100)이 제공하는 화면 인터페이스를 설명하기 위한 화면 예시도이다.4 is an exemplary view illustrating a screen interface provided by the face recognition meeting shorthand system 100 according to an exemplary embodiment of the present invention.

상기 도 4를 참조하면, 상기 얼굴인식 회의 속기 시스템(100)은 401 화면에서와 같이 카메라부(130)를 활성화하여 회의에 참석한 전체 참석자가 포함된 영상을 수집하고, 수집된 영상을 전체 영상(141)으로서 표시부(140)의 일정 영역에 출력하도록 지원할 수 있다. 이 과정에서 상기 얼굴인식 회의 속기 시스템(100)은 전체 영상(141)에 대한 영상 인식을 수행하고, 참석자들의 얼굴 영역을 추출한 후, 추출된 얼굴 영역에 대한 링크 기능을 부여할 수 있다. 즉 상기 얼굴인식 회의 속기 시스템(100)은 추출된 얼굴 영역을 사용자가 지시하는 경우, 해당 참석자의 얼굴 영역을 포커싱하도록 제어하고, 포커싱된 얼굴 영역을 영상 인식하여 참석자의 인물 정보를 검색하도록 제어하고, 해당 참석자의 위치에 할당된 마이크를 활성화하도록 제어하는 링크 기능을 지원할 수 있다. 또는 얼굴인식 회의 속기 시스템(100)은 참석자 중 적어도 한 사람이 발언을 시작하면, 얼굴 인식을 통하여 발언자를 자동으로 인식하여 해당 기능을 수행할 수 있다.Referring to FIG. 4, the face recognition meeting shorthand system 100 collects an image including all attendees who participated in the conference by activating the camera unit 130 as shown in a screen 401, and the collected image is an entire image. As 141, the display unit 140 may be output to a predetermined area of the display unit 140. In this process, the face recognition conference shorthand system 100 may perform image recognition on the entire image 141, extract a face region of the attendees, and give a link function to the extracted face region. That is, when the user indicates the extracted face area, the face recognition meeting shorthand system 100 controls to focus the face area of the participant, and controls to search for the participant information by recognizing the focused face area as an image. In addition, it may support a link function that controls to activate the microphone assigned to the participant's location. Alternatively, the face recognition meeting shorthand system 100 may automatically recognize the speaker through face recognition and perform a corresponding function when at least one of the participants starts to speak.

이에 따라, 사용자가 전체 영상(141) 중 우측 최상단에 위치한 제1 발언자(143)의 얼굴 영역을 클릭하는 경우, 상기 얼굴인식 회의 속기 시스템(100)은 해당 제1 발언자(143)의 얼굴 영역을 확대하여 표시부(140) 하단 일측에 출력할 수 있다. 제1 발언자(143)의 발언이 완료되면 상기 얼굴인식 회의 속기 시스템(100)은 새롭게 지정된 제2 발언자(145)의 얼굴을 포커싱하고, 포커싱된 제2 발언자(145)의 얼굴을 표시부(140) 일측에 출력하도록 제어할 수 있다. 이때 상기 얼굴인식 회의 속기 시스템(100)은 제1 발언자(143)에 대응하는 이미지를 표시부(140)에서 제거하고 제2 발언자(145)의 영상만을 출력할 수 도 있을 것이다.Accordingly, when the user clicks on the face area of the first speaker 143 located at the top right of the entire image 141, the face recognition conference shorthand system 100 may select the face area of the first speaker 143. The display may be enlarged and output on one side of the bottom of the display unit 140. When the first speaker 143 finishes speaking, the face recognition conference shorthand system 100 focuses the face of the newly designated second speaker 145, and displays the face of the focused second speaker 145. Can be controlled to output to one side. In this case, the face recognition conference shorthand system 100 may remove the image corresponding to the first speaker 143 from the display unit 140 and output only the image of the second speaker 145.

한편 상기 얼굴인식 회의 속기 시스템(100)은 사용자가 특정한 발언자를 지시하지 않더라도, 회의 스케줄 정보에 따라 발언 순서에 해당하는 발언자의 얼굴을 자동으로 포커싱하도록 제어할 수 있다. 즉 회의 스케줄 정보에 따라 제1 발언자(143)가 발언한 시점인 경우 상기 얼굴인식 회의 속기 시스템(100)은 제1 발언자(143)의 얼굴을 포커싱하여 표시부(140) 일측에 확대하여 출력하도록 제어할 수 있다. 그리고 상기 얼굴인식 회의 속기 시스템(100)은 기 설정된 제1 발언자(143)의 할당 시간이 경과하거나, 일정 시간 동안 발언이 없는 경우 회의 스케줄 정보에 따라 제2 발언자(145)에 얼굴 영상을 확대하여 표시부(140)에 출력하도록 제어할 수 있다. 이때 상기 얼굴인식 회의 속기 시스템(100)은 상기 제1 발언자(143)에 대한 영상을 이전 발언자의 영상으로서 표시부(140)에서 제거하지 않고 유지할 수 있으며, 설정에 따라 표시부(140)에서 제거할 수 도 있다. 다음 발언자가 선택되는 경우 제1 발언자(143)의 영상은 제거되고, 제2 발언자(145)가 이전 발언자의 영상으로서 표시부(140)에 출력될 수 있다.Meanwhile, the face recognition conference shorthand system 100 may control to automatically focus the speaker's face corresponding to the speaking order according to the meeting schedule information even if the user does not instruct the specific speaker. That is, when the first speaker 143 speaks according to the meeting schedule information, the face recognition conference shorthand system 100 controls to focus on the face of the first speaker 143 and to enlarge the output to one side of the display unit 140. can do. The face recognition meeting shorthand system 100 may enlarge the face image on the second speaker 145 according to the meeting schedule information when the preset time of the preset first speaker 143 has elapsed or there is no speech for a predetermined time. The display unit 140 may control the output to the display unit 140. In this case, the face recognition conference shorthand system 100 may maintain the image of the first speaker 143 as the image of the previous speaker without removing the image from the display 140, and may be removed from the display 140 according to a setting. There is also. When the next speaker is selected, the image of the first speaker 143 may be removed, and the second speaker 145 may be output to the display unit 140 as an image of the previous speaker.

또한 상기 얼굴인식 회의 속기 시스템(100)은 별도의 회의 스케줄 정보가 없는 경우, 전체 영상(141)에서 각 참석자들의 얼굴을 영상 인식하되 입 부근을 보다 세밀히 영상 인식하여 어떠한 참석자들이 발언을 하고 있는지를 판단할 수 도 있다. 그리고 상기 얼굴인식 회의 속기 시스템(100)은 발언을 수행하는 참석자들의 얼굴을 확대하여 표시부(140)에 출력하도록 하고, 해당 위치에 할당된 마이크를 활성화하도록 제어할 수 있다. 이 과정에서 상기 얼굴인식 회의 속기 시스템(100)은 마이크를 미리 활성화하도록 제어하고, 마이크가 수집하는 오디오 신호의 수집에 따라 발언자를 결정할 수 도 있다.In addition, when there is no separate meeting schedule information, the face recognition conference shorthand system 100 recognizes the faces of each participant in the entire image 141 but recognizes in detail the image of the part of the mouth to see which participants are speaking. You can also judge. In addition, the face recognition conference shorthand system 100 may enlarge the face of the participant who speaks and output the face to the display unit 140 and activate the microphone assigned to the corresponding position. In this process, the face recognition conference shorthand system 100 may control to activate the microphone in advance, and determine the speaker according to the collection of the audio signals collected by the microphone.

다음으로 상기 얼굴인식 회의 속기 시스템(100)은 사용자 조작이나 환경 설정에 따라 발언자들의 정보를 리스트 형태로 저장하고, 저장된 리스트 형태의 정보를 403 화면에서와 같이 출력하도록 제어할 수 있다. 이를 위하여 상기 얼굴인식 회의 속기 시스템(100)은 회의 명을 카테고리로 하며, 해당 회의에 참석한 참석자들 중 발언을 수행한 발언자들의 얼굴 영상 정보와, 발언의 시작 시간 및 종료 시간을 포함하는 발언 시간 정보, 발언자들의 인물 정보, 발언자의 오디오 파일(147)을 포함하는 리스트 항목을 생성 및 관리할 수 있다. 여기서 상기 오디오 파일(147)을 해당 발언자의 음성 파형 및 해당 발언자의 발언 내용 중 적어도 하나를 포함할 수 있다. 얼굴인식 회의 속기 시스템(100)을 운용하는 관리자가 해당 오디오 파일(147)을 클릭하는 경우, 상기 얼굴인식 회의 속기 시스템(100)은 발언자의 발언 내용을 오디오 처리부를 통하여 출력할 수 있으며, 설정에 따라 음성 파형을 표시부(140)에 출력할 수 도 있다.Next, the face recognition conference shorthand system 100 may store the speaker information in a list form according to a user manipulation or environment setting, and output the stored list form information as shown on a screen 403. To this end, the face recognition meeting shorthand system 100 has a meeting name as a category, and a speaking time including face image information of the speakers who have spoken among the participants who attended the meeting, and a start time and an end time of the speaking. A list item including information, person information of a speaker, and an audio file 147 of a speaker may be generated and managed. The audio file 147 may include at least one of a speech waveform of the speaker and contents of the speaker's speech. When the administrator who operates the face recognition conference shorthand system 100 clicks the corresponding audio file 147, the face recognition conference shorthand system 100 may output the speaker's statement through the audio processor. Accordingly, the audio waveform may be output to the display unit 140.

이상에서는 본 발명의 실시 예에 따른 얼굴인식 회의 속기 시스템(100)의 각 구성과 역할 및 기능에 대하여 살펴보았으며, 해당 시스템이 지원하는 화면 인터페이스에 대하여 설명하였다. 이하에서는 상기 얼굴인식 회의 속기 시스템(100)을 기반으로 하는 회의 속기 방법에 대하여 도면을 참조하여 보다 상세히 설명하기로 한다.In the above, each configuration, role, and function of the face recognition conference shorthand system 100 according to an exemplary embodiment of the present invention have been described, and the screen interface supported by the system has been described. Hereinafter, a conference shorthand method based on the face recognition conference shorthand system 100 will be described in detail with reference to the accompanying drawings.

도 5는 본 발명의 실시 예에 따른 회의 속기 방법을 설명하기 위한 순서도이다.5 is a flowchart illustrating a conference shorthand method according to an embodiment of the present invention.

상기 도 5를 참조하면, 상기 얼굴인식 회의 속기 시스템(100)은 먼저 501 단계에서 회의 스케줄 정보를 확인한다. 즉 상기 얼굴인식 회의 속기 시스템(100)은 회의 스케줄 정보를 확인하여 어떠한 발언자가 선행적으로 발언을 수행하는지 위치를 판단하고, 해당 위치에 할당된 마이크를 활성화하도록 하는 한편, 카메라부(130)의 포커싱을 해당 위치로 조정할 수 있다. 여기서 상기 얼굴인식 회의 속기 시스템(100)은 회의 스케줄 정보가 없는 경우, 전체 영상을 수집하도록 제어하고, 전체 영상 중 참석자들의 얼굴을 인식하여 발언을 수행하는 발언자를 검색하거나, 마이크부(120)가 수집하는 오디오 신호에 따라 오디오 신호가 수집된 위치에 착석한 참석자를 발언자로 결정할 수 있다.Referring to FIG. 5, the face recognition meeting shorthand system 100 first checks meeting schedule information in step 501. That is, the face recognition conference shorthand system 100 determines the location of which speaker is speaking in advance by checking the meeting schedule information, and activates the microphone assigned to the corresponding location, while the camera unit 130 of the You can adjust the focusing to the appropriate position. In this case, if there is no meeting schedule information, the face recognition meeting shorthand system 100 controls to collect the whole image, recognizes the face of the participants in the whole image, and searches for the speaker who performs the speech, or the microphone unit 120 According to the collected audio signal, an attendee who is seated at a location where the audio signal is collected may be determined as a speaker.

다음으로 스케줄 정보가 존재하는 경우 상기 얼굴인식 회의 속기 시스템(100)은 503 단계에서 스케줄 정보에 따라 오디오 및 영상 신호를 수집하도록 제어하고 수집된 영상에 대한 영상 인식을 수행할 수 있다. 스케줄 정보가 없는 경우, 상기 501 단계에서 상기 얼굴인식 회의 속기 시스템(100)은 결정된 발언자의 발언에 해당하는 오디오 신호 및 발언자의 영상을 수집하고, 수집된 영상에 대한 영상 인식을 수행할 수 있다.Next, when schedule information exists, the face recognition conference shorthand system 100 may control to collect audio and video signals according to the schedule information in step 503 and perform image recognition on the collected images. If there is no schedule information, in step 501, the face recognition conference shorthand system 100 may collect an audio signal corresponding to the determined speaker's statement and an image of the speaker and perform image recognition on the collected image.

다음으로 상기 얼굴인식 회의 속기 시스템(100)은 505 단계에서 수집된 정보와 영상 인식된 인물 정보를 기반으로 회의 기록을 저장하는 한편, 표시부 및 오디오 처리부를 통하여 출력하도록 제어할 수 있다. 이때 상기 얼굴인식 회의 속기 시스템(100)은 사용자 입력 및 기 설정된 회의 스케줄 정보에 따라 발언자의 오디오 신호 및 발언자의 영상을 수집하여 회의 기록으로 저장과 동시에 재생할 수 있다. 이 과정에서 상기 얼굴인식 회의 속기 시스템(100)은 발언자의 얼굴에 대응하는 영상과, 수집된 오디오 신호에 대응하는 음성 파형 이미지, 해당 발언자의 인물 정보를 표시부(140)에 출력하도록 제어할 수 있다. 아울러 사용자는 실시간으로 녹화 중인 회의 영상을 보면서, 또는 녹화된 회의 영상을 재생하면서 입력부(110)를 통하여 속기를 수행할 수 있다.Next, the face recognition conference shorthand system 100 may store the conference record based on the information collected in step 505 and the image recognized person information, and control the output to be output through the display unit and the audio processor. At this time, the face recognition conference shorthand system 100 may collect the speaker's audio signal and the speaker's video according to a user input and preset conference schedule information, and store and reproduce the same as a conference record. In this process, the face recognition conference shorthand system 100 may control the display unit 140 to output an image corresponding to the speaker's face, an audio waveform image corresponding to the collected audio signal, and person information of the speaker. . In addition, the user may perform shorthand through the input unit 110 while watching a meeting video being recorded in real time or while playing the recorded meeting video.

한편 상기 얼굴인식 회의 속기 시스템(100)은 스케줄 정보에 따라 회의 진행 중에 507 단계에서와 같이 스케줄링 되지 않은 마이크로부터 오디오 수신이 발생하는지 여부를 확인할 수 있다. 이때 오디오 수신이 있는 경우, 상기 얼굴인식 회의 속기 시스템(100)은 509 단계로 분기하여 해당 영역의 영상 및 오디오 신호 수집을 제어하고, 수집된 영상 신호에 대한 영상 인식을 수행할 수 있다. 이후 상기 얼굴인식 회의 속기 시스템(100)은 505 단계 이전으로 분기하여 이하 과정을 반복적으로 수행할 수 있다.Meanwhile, the face recognition conference shorthand system 100 may check whether audio reception occurs from an unscheduled microphone as in step 507 during the conference according to schedule information. In this case, if there is an audio reception, the face recognition conference shorthand system 100 branches to step 509 to control the collection of video and audio signals of the corresponding area and perform image recognition on the collected video signals. Thereafter, the face recognition conference shorthand system 100 may branch to step 505 to repeatedly perform the following process.

한편, 507 단계에서 오디오 수신이 없는 경우, 511 단계로 진입하여 회의 종료 여부를 확인하고, 회의 종료를 지시하는 입력 신호가 없는 경우, 501 단계 이전으로 분기하여 이하 과정을 반복적으로 수행하도록 제어할 수 있다.On the other hand, if there is no audio reception in step 507, the process proceeds to step 511 and confirms whether or not the end of the meeting, if there is no input signal for instructing the end of the meeting, branching to step 501 can be controlled to repeatedly perform the following process. have.

한편, 본 명세서와 도면에 개시된 본 발명의 실시 예들은 이해를 돕기 위해 특정 예를 제시한 것에 지나지 않으며, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시 예들 이외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 자명한 것이다.On the other hand, the embodiments of the present invention disclosed in the specification and drawings are merely presented specific examples for clarity and are not intended to limit the scope of the present invention. It is apparent to those skilled in the art that other modifications based on the technical idea of the present invention can be carried out in addition to the embodiments disclosed herein.

100 : 얼굴인식 회의 속기 시스템
110 : 입력부
120 : 마이크부
130 : 카메라부
140 : 표시부
150 : 저장부
160 : 제어부100: Face recognition shorthand system
110: input unit
120: microphone
130: camera unit
140: display unit
150: storage unit
160: control unit

Claims

In the face recognition shorthand system,
A camera unit including at least one camera for collecting an image including at least one attendee who attended a meeting;
A microphone unit including at least one microphone for collecting an audio signal spoken by at least one speaker among the attendees;
A controller configured to collect an audio signal of the speaker and an image of the speaker according to a user input and preset meeting schedule information and store the same as a conference record;
Face recognition conference shorthand system comprising a.

The method of claim 1,
A display unit configured to output at least one of an image of a speaker collected by the camera unit, an audio waveform of the audio signal, and an entire image including all the attendees;
Face recognition conference shorthand system further comprising.

The method of claim 2,
An input unit configured to generate an input signal for instructing a speaker who performs speech among all images output to the display unit;
Face recognition conference shorthand system further comprising.

The method of claim 3,
The display unit
An area in which the entire image is output;
An area for enlarging and outputting a speaker's face indicated by the input unit of the entire image;
Face recognition conference shorthand system comprising a.

The method of claim 1,
A storage unit for storing the conference record and storing person information of the speaker;
Face recognition conference shorthand system further comprising.

The method of claim 5,
The control unit
Determining at least one of the speaker recognized according to the preset meeting schedule information, the speaker instructed according to the user input, the entire image including the attendees, the speaker is recognized, the audio signal and the video signal of the determined speaker A meeting scheduling unit configured to control to collect and control to collect person information of the determined speaker;
An audio collector configured to collect an audio signal by activating a microphone assigned to the speaker's position determined by the conference scheduling unit;
An image recognition and collection unit configured to collect a speaker image by allocating focusing to a speaker determined according to the conference scheduling unit;
A meeting record storage and output unit configured to output and store at least one of the audio signal, the image, and the person information;
Face recognition conference shorthand system comprising a.

The method of claim 6,
The meeting record storage and output unit
A facial recognition conference shorthand system, characterized in that the audio signal of the speaker and the speaker's video are collected according to the user input and preset meeting schedule information, and are simultaneously recorded and stored as a conference record.

Identifying the location of a speaker to speak or to speak among at least one participant in the meeting;
Collecting an image of the speaker and an audio signal corresponding to the speaker's statement;
Storing the collected speaker's image and the speaker's audio signal;
Face recognition meeting shorthand method comprising a.

The method of claim 8,
The process of checking the location of the speaker
Outputting the entire image including the participant to a display unit, and instructing a specific participant among the participants included in the entire image output to the display unit as a speaker;
Identifying previously stored schedule information and designating a specific participant as a speaker according to the schedule information;
Acquiring a full image including the participant, and instructing a participant who is recognized as an image to perform a speech by recognizing a face image of each participant included in the full image;
Collecting an audio signal with microphones assigned to the attendees, and determining an attendee at the location where the audio signal is collected as a speaker;
Face recognition meeting shorthand method comprising the step of at least one of.

The method of claim 8,
Storing sample information for face recognition of the participants;
Storing person information of the participants;
Recognizing the speaker's image to identify the speaker's person information and storing the identified person's information together with the speaker's image and audio signal;
Face recognition meeting shorthand method, characterized in that it further comprises.

The method of claim 8,
Outputting at least one of an image of a speaker collected by the camera unit, an audio waveform of the audio signal, and an entire image including all the attendees;
Face recognition meeting shorthand method, characterized in that it further comprises.

The method of claim 11,
When a specific speaker is instructed while the entire image is output, enlarging and outputting a face of the indicated speaker;
Face recognition meeting shorthand method, characterized in that it further comprises.

The method of claim 8,
The storing process,
Collecting the speaker's audio signal and the speaker's video according to a user input and preset conference schedule information, and storing and playing the same as a conference record;
Face recognition meeting shorthand method, characterized in that it further comprises.