KR20190016683A

KR20190016683A - Apparatus for automatic conference notetaking using mems microphone array

Info

Publication number: KR20190016683A
Application number: KR1020170100909A
Authority: KR
Inventors: 김영기; 강준구; 정욱진; 배석형; 김용관; 김민아; 김우종; 김하림; 남현욱; 문혜미; 최종훈
Original assignee: (주)에스엠인스트루먼트; 한국과학기술원
Priority date: 2017-08-09
Filing date: 2017-08-09
Publication date: 2019-02-19
Also published as: KR101976937B1

Abstract

The present invention relates to an apparatus for automatically taking minutes using a microphone array. The apparatus may identify a position of a speaker by detecting reception coordinates of a sound signal sensed by a microphone and recognize voice by speaker, convert the voice into text, output the text as an image, and generate log information to automatically take minutes. The apparatus has an effect of discriminating and confirming conversation content per speaker by sensing sound through a microphone and identifying the position of the speaker through a sensed sound signal and recognizing voice per speaker to convert the voice into the text. In addition, the apparatus has an effect of automatically taking minutes by sequentially recording the recognized text per speaker, generating the log information, and taking the minutes.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to an apparatus for automatically generating minutes using microphone arrays,

본 발명은 사운드신호의 감지에 의해 화자의 음성을 인식하여 회의록을 자동으로 작성하는 장치로써, 보다 상세하게는 마이크로폰에 감지된 사운드신호의 수신좌표를 검출하여 화자의 위치를 식별하며, 화자의 음성을 인식하여 텍스트로 변환하여 영상으로 출력 및 로그정보를 생성하여 회의록을 자동으로 작성가능한 마이크로폰 어레이를 이용한 회의록 자동작성장치에 관한 것이다. [0001] The present invention relates to a device for recognizing a speaker's voice by the detection of a sound signal and automatically creating a meeting record, more particularly, to detecting a position of a speaker by detecting reception coordinates of a sound signal sensed by a microphone, The present invention relates to an apparatus for automatic recording of minutes using a microphone array capable of automatically generating minute notes by outputting images and outputting log information.

회의, 강의, 토론 등의 환경에서 화자와 청중 간의 거리가 근접할 경우, 서로간의 대화가 원만하게 진행될 수 있다. 하지만, 화자와 청중 간의 거리가 멀어질수록 서로 간의 커뮤니케이션이 원만하지 않을 수 있다. 특히, 대강의실과 같이 크고 넓은 공간에서 마이크와 같이 음성을 증폭시키는 장치없이는 강의자의 소리를 모든 청중이 듣기에는 무리가 있다. 또한, 회의록을 작성하는 방식으로는 회의 내용을 녹취한 이후에 녹취된 음성파일을 통해 회의록을 따로 작성하거나 속기사와 같은 전문인력을 사용한다. When the distance between the speaker and the audience is close to each other in an environment such as a meeting, a lecture, or a discussion, conversation between them can proceed smoothly. However, as the distance between the speaker and the audience increases, the communication between them may not be smooth. Especially, it is difficult for all audiences to hear the lecturer's voice without amplifying the voice like a microphone in a large and large space like a large lecture room. In addition, in the method of creating the minutes, minutes are recorded separately from recorded voice files after recording the contents of meetings, or professional workers such as stenographers are used.

한국 등록특허 제10-0936244호(이하 '선행문헌'이라 칭함)는 스테레오 카메라를 이용하여 사람 입의 위치를 특정하고 그 특정된 위치로 지향성 마이크를 향하게 함으로써 주변 잡음이 제거된 정확한 음성을 취득할 수 있는 로봇용 지능형 음성입력 장치에 관한 것이다. 영상인식을 통하여 음원을 특정하고 특정된 음원을 지향하도록 지향성 마이크를 수평 및 수직 회전함으로써, 주변의 잡음을 최소화한 정확한 음성을 얻을 수 있는 장점이 있다.Korean Patent No. 10-0936244 (hereinafter referred to as " Prior Art Document ") uses a stereo camera to specify a position of a human mouth and directs a directional microphone to the specified position to acquire an accurate voice from which ambient noise has been removed To an intelligent voice input device for a robot. The directional microphone is horizontally and vertically rotated so that the sound source is specified through the image recognition and the specified sound source is directed. Thus, it is possible to obtain an accurate sound with minimized peripheral noise.

선행문헌은 영상을 통해 사용자를 인식하여 사용자의 음성을 효율적으로 수신받기 위해 마이크를 사용자가 있는 방향으로 제어하는 방식으로써, 카메라를 통해 한 명의 사용자를 인식하는 구조이다.The preceding document recognizes the user through the video and controls the microphone in the direction of the user in order to receive the user's voice efficiently, and recognizes one user through the camera.

이렇듯 선행문헌은 회의나 강의 등 다수의 청중에 있는 상황에서는 사용하기 어려운 문제점이 있다. 따라서 다수의 청중을 인식할 수 있으며, 청중의 음성을 인식하여 회의록을 자동으로 작성할 수 있는 시스템이 필요한 실정이다.These prior art documents are difficult to use in situations such as meetings or lectures. Therefore, there is a need for a system capable of recognizing a large number of audiences and automatically recording the minutes of the audiences by recognizing the audiences' voices.

한국 등록특허 제10-0936244호(발명의 명칭 : 로봇용 지능형 음성입력 장치 및 그 운용 방법, 등록일 : 2010.01.04.)Korean Patent No. 10-0936244 entitled " Intelligent Speech Input Device for Robot and Its Operation Method, Date of Registration: 2010.01.04.

본 발명은 위와 같은 문제점을 해결하기 위해 마이크로폰을 통해 사운드를 감지하며, 사운드신호를 통해 화자의 위치를 식별 및 화자의 음성을 인식하여 텍스트로 변환하는데 그 목적이 있다.In order to solve the above problems, the present invention has an object of detecting a sound through a microphone, recognizing the position of the speaker through a sound signal, and recognizing the speaker's voice to convert the sound into text.

또한, 본 발명은 화자별로 인식된 텍스트를 순차적으로 기록하여 로그정보를 생성 및 회의록을 작성하는데 그 목적이 있다. It is another object of the present invention to generate log information and record minutes by sequentially recording text recognized per speaker.

본 발명에 따른 마이크로폰 어레이를 이용한 회의록 자동작성장치는 마이크로폰 어레이를 통해 사운드를 감지하는 사운드감지부, 상기 사운드감지부의 감지영역을 설정하며, 상기 감지영역 내에서 발생되는 사운드의 위치를 인식하기 위하여 상기 감지영역의 격자 위치좌표를 설정하는 감지영역설정부, 상기 감지영역에서 감지된 상기 사운드의 위치좌표를 통해 사운드위치를 식별하는 사운드위치식별부, 상기 사운드위치를 화자로 지정하는 화자지정부, 상기 화자지정부에서 지정된 화자의 사운드를 화자음성으로 인식하여 텍스트로 변환하는 텍스트변환부, 특정 영역을 촬영하기 위한 영상촬영부, 상기 사운드위치와 상기 영상촬영부에 의해 촬영된 영상정보를 매핑시키는 매핑부, 상기 텍스트가 상기 사운드위치가 매핑된 영상정보에 실시간으로 디스플레이되는 영상출력부, 상기 텍스트를 순차적으로 기록하여 로그정보를 생성하는 로그생성부, 및 상기 로그정보를 통해 회의록정보를 생성하는 회의록작성부를 포함한다.The apparatus for automatic recording of minutes using a microphone array according to the present invention includes a sound sensing unit for sensing sound through a microphone array, a sensing area for sensing the sound sensing unit, A sound region identifying unit for identifying a sound position based on a positional coordinate of the sound detected in the sensing region, a field manager for designating the sound position as a speaker, A text conversion unit for recognizing a sound of a speaker designated by the picture language unit as a speaker voice and converting the sound into a text, an image capturing unit for capturing a specific area, a mapping unit for mapping the sound position and the image information captured by the image capturing unit And the text is added to the image information mapped with the sound position in real time By recording the video output unit, the text is displayed in sequence comprises the minutes writing unit configured to generate information minutes over a log generation unit, and the log information to generate the log information.

본 발명에 따른 상기 음향위치식별부는 상기 격자 위치좌표마다 수신되는 상기 사운드신호를 합산하여 각각의 상기 격자 위치좌표에 대응하는 빔파워레벨을 산출하는 빔파워산출부, 상기 빔파워레벨에 출력의 분석을 최소화시키는 가중치를 적용시켜 음원추출을 위한 빔형성 출력값을 생성하는 빔형성생성부, 및 상기 빔형성 출력값의 세기가 가장 강한 위치의 좌표를 상기 사운드위치로 지정하는 사운드위치지정부를 포함한다.The acoustic position identification unit according to the present invention includes a beam power calculation unit for calculating a beam power level corresponding to each of the grid position coordinates by summing the sound signals received for each of the grid position coordinates, And a sound position designator for designating the coordinates of a location where the intensity of the beamforming output value is strongest as the sound position.

본 발명은 위와 같은 문제점을 해결하기 위해 마이크로폰을 통해 사운드를 감지하며, 사운드신호를 통해 화자의 위치를 식별 및 화자의 음성을 인식하여 텍스트로 변환함으로써, 대화내용을 화자별로 구분하여 확인할 수 있는 효과가 있다. In order to solve the above-mentioned problems, the present invention provides a method for detecting a sound through a microphone, recognizing the position of the speaker through a sound signal, and recognizing the speaker's voice and converting the sound into text, .

또한, 본 발명은 화자별로 인식된 텍스트를 순차적으로 기록하여 로그정보를 생성 및 회의록을 작성함으로써, 자동으로 회의록을 작성할 수 있는 효과가 있다.In addition, the present invention has the effect of automatically creating a meeting record by sequentially generating text recognized per speaker and generating log information and creating a meeting record.

도 1은 본 발명에 따른 마이크로폰 어레이를 이용한 회의록 자동작성장치의 구성도이다.
도 2는 본 발명에 따른 마이크로폰 어레이를 이용한 회의록 자동작성장치의 일 실시예를 나타낸 도면이다.
도 3은 본 발명에 따른 사운드감지에 따른 화자를 식별하는 방식을 설명하기 위한 도면이다.
도 4는 본 발명에 따른 화자의 위치정보와 영상정보와 매핑하는 방식을 설명하기 위한 도면이다.
도 5는 본 발명에 따른 로그정보 생성 및 회의록 자동작성을 설명하기 위한 도면이다.FIG. 1 is a block diagram of an apparatus for automatically generating minutes of minutes using a microphone array according to the present invention.
FIG. 2 is a block diagram of an apparatus for automatically generating a meeting list using a microphone array according to an embodiment of the present invention. Referring to FIG.
3 is a diagram for explaining a method of identifying a speaker according to sound detection according to the present invention.
4 is a diagram for explaining a method of mapping location information and image information of a speaker according to the present invention.
FIG. 5 is a diagram for explaining automatic generation of log information and recording of minutes according to the present invention.

이하, 본 발명의 바람직한 실시 예에 대하여 첨부된 도면을 참조하여 상세히 설명하기로 한다. 본 발명의 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.

도 1은 본 발명에 따른 마이크로폰 어레이를 이용한 회의록 자동작성장치의 구성도이다. 도 1을 살펴보면, 마이크로폰 어레이를 이용한 회의록 자동작성장치는 사운드감지부(100), 영상촬영부(200), 음향처리부(300), 영상출력부(400)를 포함할 수 있다. FIG. 1 is a block diagram of an apparatus for automatically generating minutes of minutes using a microphone array according to the present invention. Referring to FIG. 1, the apparatus for automatically recording a meeting list using a microphone array may include a sound sensing unit 100, an image sensing unit 200, a sound processing unit 300, and an image output unit 400.

사운드감지부(100)는 마이크로폰 어레이를 통해 사운드를 감지하는 장치이다. 사운드감지부(100)는 적어도 하나의 마이크로폰이 특정 형태로 배치된 어레이의 형태로 이루어된다. 마이크로폰의 어레이 형태로는 복수개의 마이크로폰이 가로, 세로, 대각선, 매트릭스 등의 형태로 배치될 수 있다. 사운드감지부(100)의 마이크로폰은 MEMS(Micro Electro Mechanical Systems) 마이크로폰과 같이 소형의 장치로 구성되는 것이 바람직하다. 마이크로폰 어레이는 일정간격으로 구성되거나 사용자에 의해 간격거리를 달리하여 배치될 수 있다.The sound sensing unit 100 is a device that senses sound through a microphone array. The sound sensing unit 100 is formed in the form of an array in which at least one microphone is arranged in a specific form. In the form of an array of microphones, a plurality of microphones may be arranged in the form of a horizontal, vertical, diagonal, or matrix. The microphone of the sound sensing unit 100 is preferably a compact device such as a MEMS (Micro Electro Mechanical Systems) microphone. The microphone arrays may be arranged at regular intervals or may be arranged with different spacing distances by the user.

영상촬영부(200)는 특정 영역을 촬영하여 영상정보를 생성하기 위한 장치이다. 영상촬영부(200)는 사운드감지부(100)에 의해 사운드의 감지가 가능한 영역과 동일한 영역을 촬영한다.The image capturing unit 200 is a device for capturing a specific area and generating image information. The image capturing unit 200 captures the same area as the area where sound can be sensed by the sound sensing unit 100.

도 1을 참조하면, 사운드처리부(300)는 감지영역설정부(310), 사운드위치식별부(320), 화자지정부(330), 텍스트변환부(340), 매핑부(350), 로그생성부(360), 회의록작성부(370)를 포함할 수 있다. Referring to FIG. 1, the sound processing unit 300 includes a sensing area setting unit 310, a sound position identifying unit 320, a characterizing unit 330, a text converting unit 340, a mapping unit 350, Unit 360, and a meeting record preparing unit 370. [

감지영역설정부(310)는 사운드감지부(100)의 감지영역을 설정하며, 감지영역 내에서 발생되는 사운드의 위치를 인식하기 위하여 감지영역의 격자 위치좌표를 설정하는 장치이다. 관심영역은 마이크로폰을 통해 사운드의 식별이 가능한 가상의 영역을 의미한다. The sensing area setting unit 310 sets the sensing area of the sound sensing unit 100 and sets the grid position coordinates of the sensing area in order to recognize the position of the sound generated in the sensing area. The area of interest refers to a virtual area where the sound can be identified through the microphone.

사운드위치식별부(320)는 감지영역에서 감지된 사운드의 위치좌표를 통해 사운드위치를 식별하는 장치이다. 사운드위치식별부(320)는 빔파워산출부(321), 빔형성부(322), 사운드위치지정부(323)을 포함할 수 있다. The sound position identifying unit 320 is a device for identifying the sound position through the position coordinates of the sound sensed in the sensing area. The sound position identifying unit 320 may include a beam power calculating unit 321, a beam forming unit 322, and a sound position specifying unit 323. [

빔파워산출부(321)는 격자 위치좌표마다 수신되는 사운드신호를 합산하여 각각의 격자 위치좌표에 대응하는 빔파워레벨을 산출하는 장치이다.The beam power calculator 321 is a device for calculating a beam power level corresponding to each grid position coordinate by summing the sound signals received for each grid position coordinate.

일 예로, 감지영역설정부(310)에 의해 설정된 격자 위치좌표가 3 X 5의 그리드의 형태로 총 15개의 위치좌표가 설정되어 있다면, 빔파워산출부(321)는 각각의 위치좌표에서 수신되는 사운드신호를 합산한다. 각각의 위치좌표에서 합산된 신호를 통해 빔파워레벨을 산출한다.For example, if a total of 15 position coordinates are set in the form of a grid of 3 x 5 grid coordinates set by the sensing area setting unit 310, the beam power calculator 321 calculates Adds the sound signal. The beam power level is calculated through the summed signals at the respective position coordinates.

한편, 마이크로폰이 서로 일정거리만큼 이격되어 설치됨에 따라 각각의 마이크로폰에 수신되는 사운드의 수신각이 서로 다르게 된다. 따라서, 빔파워산출부(321)는 사운드의 수신 시간차를 보정하기 위해 사운드의 수신각에 따른 시간딜레이를 각각의 위치좌표에 설정한다. 빔파워산출부(321)에 의해 산출된 빔파워레벨이 가장 높은 위치좌표를 기준으로 시간딜레이를 각각의 위치좌표에 적용시켜, 사운드신호의 위상각(phase angle)을 동일하게 보정한다.Meanwhile, since the microphones are spaced apart from each other by a predetermined distance, the receiving angles of the sounds received by the respective microphones become different from each other. Accordingly, the beam power calculating unit 321 sets the time delay in accordance with the reception angle of the sound to each position coordinate so as to correct the time difference of sound reception. A time delay is applied to each of the position coordinates with reference to the position coordinate having the highest beam power level calculated by the beam power calculating section 321 to correct the phase angle of the sound signal equally.

이러한 빔파워레벨을 추출하는 방식은 아래의 식으로 표현될 수 있다.The method of extracting the beam power level can be expressed by the following equation.

[식 1][Formula 1]

빔형성생성부(322)는 빔파워레벨에 출력의 분석을 최소화시키는 가중치를 적용시켜 음원추출을 위한 빔형성 출력값을 생성한다. 이는 마이크로폰에 수신되는 음향이 화자의 음성정보뿐 아니라 잡음, 소음과 같은 노이즈가 포함됨에 따라 원하는 방향 이외의 간섭신호를 억제하기 위함이다. 따라서 빔형성 출력의 분산을 최소화하여 사운드가 수신된 방향의 사운드신호에만 집중할 수 있다.The beamforming generator 322 generates a beamforming output value for extracting a sound source by applying a weight to minimize the analysis of the output to the beam power level. This is to suppress interference signals other than the desired direction as the sound received by the microphone includes not only voice information of the speaker but also noise such as noise and noise. Thus, the variance of the beamforming output is minimized so that the sound can be focused only on the sound signal in the received direction.

빔형성 출력의 분산을 최소화시키는 가중치는 아래의 식으로 표현될 수 있다.The weighting that minimizes dispersion of the beamforming output can be expressed by the following equation.

[식 2][Formula 2]

가중치를 통해 음원 추출을 위한 빔형성 출력 값은 아래의 식과 같다.The beamforming output values for the sound source extraction through the weights are as follows.

[식 3][Formula 3]

사운드위치지정부(323)는 빔형성 출력값의 세기가 가장 강한 위치의 좌표를 사운드위치로 지정하는 장치이다. The sound position designation unit 323 is a device for designating the coordinates of the position where the intensity of the beam forming output value is the strongest as the sound position.

화자지정부(330)는 사운드위치를 화자로 지정하는 장치이다. 즉, 빔형성 출력값이 가장 강한 위치가 화자로부터 음성이 발성되는 위치가 되며, 화자지정부(330)는 지정된 사운드위치에 화자가 위치한 것으로 판단한다.The tone-mapping unit 330 is a device for specifying a sound position as a speaker. That is, the position where the beamforming output value is strongest becomes the position where the voice is uttered from the speaker, and the image indexer 330 determines that the speaker is located at the designated sound position.

텍스트변환부(340)는 화자지정부(330)에서 지정된 화자의 사운드를 화자음성으로 인식하여 텍스트로 변환하는 장치이다. 화자음성을 텍스트로 변환하는 방식으로는 구글에서 제공하는 음성인식 API 등과 같이 음성을 텍스트로 변환가능한 어플리케이션을 이용한다. 텍스트변환부(340)는 구글의 음성인식 API와 무선네트워크로 연결될 수 있다. The text conversion unit 340 is a device for recognizing the sound of the speaker designated by the image-mapping unit 330 as a speaker's voice and converting the sound into a text. As a method of converting a speaker voice to text, an application that can convert voice to text, such as a speech recognition API provided by Google, is used. The text conversion unit 340 may be connected to the voice recognition API of Google via a wireless network.

매핑부(350)는 사운드위치와 영상촬영부(200)에 의해 촬영된 영상정보를 매핑시키는 장치이다. The mapping unit 350 is a device for mapping the sound position and the image information photographed by the image photographing unit 200.

영상출력부(400)는 변환된 텍스트가 사운드위치가 매핑된 영상정보에 실시간으로 디스플레이되는 장치이다. The video output unit 400 is a device in which the converted text is displayed in real time on the video information to which the sound position is mapped.

이하 도 2 내지 4를 통해 사운드감지에 따른 화자의 위치를 식별하여 영상정보로 출력하는 방식을 설명하도록 한다. 이하의 설명에 따라 본 발명에 따른 마이크로폰 어레이를 이용한 회의록 자동작성장치의 구성이 보다 명확해질 수 있다. Hereinafter, a method of identifying the position of the speaker according to the sound detection and outputting it as image information will be described with reference to FIGS. The configuration of the automatic recording device for recording minutes using the microphone array according to the present invention can be made more clear according to the following description.

도 2는 본 발명에 따른 마이크로폰 어레이를 이용한 회의록 자동작성장치의 일 실시예를 나타낸 도면이다. 도 2를 살펴보면, 화자가 P1, P2, P3으로 이루어져 있으며, 사운드감지부(100)는 화자로부터 사운드를 수신받는다. 영상촬영부(200)는 사운드감지부(100)에 의해 사운드가 수신가능한 영역을 촬영한다. 사운드감지부(100)에 의해 감지된 사운드정보 및 영상촬영부(200)에 의해 촬영된 영상정보는 사운드처리부(300)로 전송된다. FIG. 2 is a block diagram of an apparatus for automatically generating a meeting list using a microphone array according to an embodiment of the present invention. Referring to FIG. Referring to FIG. 2, the speaker is composed of P1, P2, and P3, and the sound sensing unit 100 receives sound from the speaker. The image capturing unit 200 captures an area where sound can be received by the sound sensing unit 100. [ The sound information sensed by the sound sensing unit 100 and the image information sensed by the image sensing unit 200 are transmitted to the sound processing unit 300.

사운드처리부(300)는 수신된 사운드정보로부터 화자를 식별한다. 이하 도 3을 통해 사운드감지에 따른 화자를 식별하는 방식을 설명하도록 한다. The sound processing unit 300 identifies the speaker from the received sound information. A method for identifying a speaker according to sound detection will be described with reference to FIG.

도 3의 '격자 위치좌표 설정'을 참조하면, 사운드감지부(100)를 통해 인식가능한 가상의 감지영역이 설정된 것을 살펴볼 수 있다. 또한, 감지영역 내에는 격자의 위치좌표가 설정되어 있다. 도 3의 실시예의 경우, 3 X 5의 격자 위치좌표가 설정되어 있으며, 각각의 위치좌표는 사운드의 식별이 가능한 위치가 된다. 한편, 마이크로폰 어레이의 배치와 위치좌표의 배치는 동일하지 않으며, 사용자에 의해 설정된다.Referring to the 'grid position coordinate setting' of FIG. 3, it can be seen that a virtual sensing area recognizable through the sound sensing unit 100 is set. The position coordinates of the grating are set in the sensing area. In the case of the embodiment of Fig. 3, the grid position coordinates of 3 x 5 are set, and each position coordinate is a position at which the sound can be identified. On the other hand, the arrangement of the microphone array and the arrangement of the position coordinates are not the same and are set by the user.

빔파워산출부(321)는 각각의 격자 위치좌표마다 수신된 사운드신호를 합산하여 빔파워레벨를 산출한다. 음성위치식별부(320)는 각각의 위치좌표에서 산출된 빔파워레벨 중 가장 강한 빔파워레빌이 산출된 위치좌표를 사운드(화자음성)가 수신된 위치로 식별한다. 한편, 사운드(화자음성)가 수신되는 위치 외에서 수신되는 잡음(노이즈)의 간섭을 최소화하기 위해 빔파워레벨에 출력의 분석을 최소화시키는 가중치를 적용시켜 음원추출을 위한 빔형성 출력값을 생성한다. The beam power calculator 321 calculates the beam power level by summing the received sound signals for each grid position coordinate. The voice position identification unit 320 identifies the position coordinate at which the strongest beam power level is calculated among the beam power levels calculated at the respective position coordinates, as the position at which the sound (speaker voice) is received. On the other hand, a beamforming output value for sound source extraction is generated by applying a weight that minimizes the analysis of the output to the beam power level in order to minimize the interference of noise (noise) received outside the position where the sound (speaker voice) is received.

도 3의 '화자위치 지정'을 참조하면, 사운드위치식별부(320)에 의해 위치좌표 '3'이 사운드가 수신된 위치로 식별되었다. 화자지정부(330)는 사운드가 수신된 위치좌표 '3'에 화자가 위치되어 있다고 지정한다.Referring to the 'speaker position designation' of FIG. 3, the sound position identifying unit 320 has identified the position coordinate '3' as the position at which the sound was received. The character-mapping unit 330 specifies that the speaker is located at the position coordinate '3' at which the sound is received.

도 4는 본 발명에 따른 사운드위치와 영상정보를 매핑하는 방식을 설명하기 위한 도면이다.4 is a diagram for explaining a method of mapping sound position and image information according to the present invention.

도 4의 '화자위치와 영상정보 매핑'을 참조하면, 매핑부(350)에 의해 영상촬영부(200)를 통해 촬영된 영상정보와 감지영역설정부(310)에 의해 설정된 격자 위치좌표가 매핑된다. 한편, 화자지정부(330)에 의해 격자 위치좌표 내에서 사운드위치가 지정되었음에 따라 영상정보 내에서 화자의 위치가 식별된다. Referring to the 'speaker position and image information mapping' of FIG. 4, the image information photographed by the image capturing unit 200 by the mapping unit 350 and the grid position coordinates set by the sensing region setting unit 310 are mapped do. On the other hand, the position of the speaker is identified in the image information as the sound position is designated by the image pitcher 330 in the grid position coordinates.

영상출력부(400)는 사운드위치가 매핑된 영상정보에 화자음성이 변환된 텍스트를 디스플레이한다. 도 4의 '화자별 텍스트 디스플레이'를 참조하면, 화자의 음성정보가 텍스트로 변환되어 디스플레이된 것을 볼 수 있다. 도 4는 위치좌표 '3'에 화자가 지정된 경우로써, 위치좌표 '3'에 감지된 사운드를 화자의 음성정보로 인식하여 텍스트로 출력된 화면이다. The video output unit 400 displays a text in which the speaker's voice is converted into the video information to which the sound position is mapped. Referring to FIG. 4, 'Text Display by Speaker', it is seen that speech information of a speaker is converted into text and displayed. FIG. 4 is a screen in which a speaker is designated at the position coordinate '3', and the sound sensed at the position coordinate '3' is recognized as speech information of the speaker and output as text.

본 발명에 따른 영상출력부(400)는 화자음성을 텍스트로 화자별로 구분하여 디스플레이 할 수 있다. 앞서 '화자별 텍스트 디스플레이'의 경우, 위치좌표 '3'을 '사용자 1'로 지정한 경우이며, 위치좌표 '1'에 또 다른 화자인 '사용자 2'가 지정될 경우, '사용자 2'의 위치에서 수신되는 사운드를 '사용자 1'과 구분하여 디스플레이할 수 있다.The image output unit 400 according to the present invention can display the speaker's voice by text and display it by speaker. In the case of 'display of text by speaker', when position coordinate '3' is designated as 'user 1' and 'speaker 2' which is another speaker is designated in position coordinate '1' Can be displayed separately from 'User 1'.

도 5는 본 발명에 따른 로그정보 생성 및 회의록 자동작성을 설명하기 위한 도면이다.FIG. 5 is a diagram for explaining automatic generation of log information and recording of minutes according to the present invention.

로그생성부(360)는 텍스트를 순차적으로 기록하여 로그정보를 생성하는 장치이다. 도 5의 Chat log를 살펴보면, 사용자(화자)별로 대화내용, 대화시간 이 기록된 것을 볼 수 있다. 사용자(화자)별로 기록된 로그는 대화순서에 따라 순차적으로 기록된다.The log generation unit 360 is a device for sequentially generating texts to generate log information. In the chat log of FIG. 5, it can be seen that conversation contents and conversation time are recorded for each user (speaker). Logs recorded for each user (speaker) are sequentially recorded according to the dialog sequence.

회의록작성부(370)는 로그정보를 통해 회의록정보를 생성한다. 도 5의 Chat Logd의 상단부를 살펴보면, 저장버튼이 구비되어 있다. 사용자가 저장버튼을 클릭할 경우, Chat log에 기록된 로그정보가 저장됨으로써 회의록이 생성된다.The meeting record creating unit 370 generates the meeting record information through the log information. 5, a save button is provided. When the user clicks the save button, the log information recorded in the chat log is stored to generate the minutes.

100 : 사운드감지부 200 : 영상촬영부
300 : 사운드처리부 310 : 감지영역설정부
320 : 사운드위치식별부 321 : 빔파워산출부
322 : 빔형성부 323 : 사운드위치지정부
330 : 화자지정부 340 : 텍스트변환부
350 : 매핑부 360 : 로그생성부
370 : 회의록작성부 400 : 영상출력부
100: sound detection unit 200:
300: sound processing unit 310: detection area setting unit
320: sound position identifying unit 321: beam power calculating unit
322: beam forming unit 323: sound position determining unit
330: a text input unit 340: a text conversion unit
350: mapping unit 360: log generation unit
370: minutes recording section 400: video output section

Claims

A sound sensing unit for sensing sound through the microphone array;
A sensing region setting unit for setting a sensing region of the sound sensing unit and setting a grid position coordinate of the sensing region to recognize a position of a sound generated in the sensing region;
A sound position identifying unit for identifying a sound position through position coordinates of the sound sensed in the sensing region;
A dialogue unit for designating the sound position as a speaker; And
And a text conversion unit for recognizing the sound of the speaker designated by the speech output unit as a speaker's voice and converting the sound into a text.

The method according to claim 1,
An image photographing unit for photographing a specific area;
A mapping unit for mapping the sound position and the image information photographed by the image photographing unit; And
And an image output unit for displaying the text in real time on the image information to which the sound position is mapped.

3. The method of claim 2,
A log generation unit for sequentially writing the text to generate log information; And
And a meeting record preparing unit for generating meeting record information through the log information.

2. The apparatus of claim 1, wherein the acoustic location identifier
A beam power calculation unit for calculating a beam power level corresponding to each of the grid position coordinates by summing the sound signals received for each of the grid position coordinates;
A beamforming generator for generating a beamforming output value for extracting a sound source by applying a weight to minimize the analysis of the output to the beam power level; And
And a sound position designating unit for designating a coordinate of a position having the strongest intensity of the beam forming output value as the sound position.