KR101592518B1

KR101592518B1 - The method for online conference based on synchronization of voice signal and the voice signal synchronization process device for online conference and the recoding medium for performing the method

Info

Publication number: KR101592518B1
Application number: KR1020140112554A
Authority: KR
Inventors: 조정훈
Original assignee: 경북대학교 산학협력단
Priority date: 2014-08-27
Filing date: 2014-08-27
Publication date: 2016-02-05

Abstract

Disclosed are a method for an online conference based on synchronization of a voice signal, a voice signal synchronization processing device for an online conference, and a recording medium for performing the same. According to the method for an online conference based on synchronization of a voice signal, a voice signal synchronization processing device receives a first synchronization signal including information about a sound signal and a speaking time point of the sound signal from a speaker terminal, extracts a voice signal of a speaker included in the sound signal based on post-processing of the sound signal, and transmits a second synchronization signal determined based on the voice signal and the first synchronization signal to a listener terminal. The second synchronization signal can determine a play time of the voice signal in the listener terminal.

Description

TECHNICAL FIELD The present invention relates to an online conference method based on synchronization of a voice signal, a voice signal synchronization processing apparatus for on-line conference, and a recording medium for performing the same. and the recoding medium for performing the method.

본 발명은, 음성 신호의 동기화를 기반으로 한 온라인 회의 방법, 온라인 회의를 위한 음성 신호 동기화 처리 장치 및 이를 수행하기 위한 기록 매체에 관한 발명으로서, 보다 상세하게는, 발화자로부터 발생된 음성 신호를 청취자에게 전달하기 위한 음성 신호의 동기화를 기반으로 한 온라인 회의 방법, 온라인 회의를 위한 음성 신호 동기화 처리 장치 및 이를 수행하기 위한 기록 매체에 관한 발명이다.The present invention relates to an online conference method based on synchronization of a voice signal, an apparatus for synchronizing a voice signal for on-line conference, and a recording medium for performing the same, and more particularly, The present invention relates to an online conference method based on synchronization of a voice signal to be transmitted to a mobile terminal, a device for synchronizing a voice signal for on-line conference, and a recording medium for performing the same.

조직이 분업화 조직화 되어갈수록 구성원 간의 의사 소통 및 합리적 결과 도출을 위한 의사 결정 방법은 많은 발전을 하고 있다. 기업의 관리자들은 많은 시간을 의사 결정을 위한 회의를 위해 사용하고 있다. 이러한 회의를 위해 사용되는 시간의 비율은 경영 환경이 복잡하고 불확실해짐에 따라 더욱 높아지게 된다. 따라서, 관리자들의 생산성을 높이기 위해서는 회의를 효율적으로 진행할 수 있는 회의 지원 시스템이 필요하다.As the organization becomes more fragmented and organized, the decision - making methods for communicating among members and for deriving rational results are making much progress. Business managers are spending a lot of time on meetings for decision making. The percentage of time spent for such meetings will increase as the business environment becomes more complex and uncertain. Therefore, in order to increase the productivity of the managers, a conferencing support system capable of efficiently conducting the conference is needed.

그 대표적인 예로 기존 회의의 단점인 공간의 제약성을 극복한 온라인 회의가 있다. 통신 기술과 공학의 발전에 힘입어 전통적인 오프라인 회의 방식이 아닌 멀리 떨어진 원격 지간에서 수행되는 온라인 회의가 많이 활용되고 있다. 온라인 회의 시스템은 지리적 또는 시간적으로 많이 떨어져 있는 여러 회의실을 상호 중계하여 다수의 회의자들을 통제할 수 있어야 한다. As a representative example, there is an on-line meeting that overcomes limitations of space, which is a drawback of existing meetings. Thanks to the development of communication technology and engineering, online conferences are being utilized not in the traditional offline meetings but in the remote distant places. Online conferencing systems should be able to control multiple conferences by relaying multiple rooms that are geographically or temporally far apart.

KR 10-2009-0008832KR 10-2009-0008832

본 발명의 일 측면은 발화자에 의해 발생된 음성 신호를 청취자에게 전달하기 위한 음성 신호의 동기화를 기반으로 한 온라인 회의 방법을 제공한다.One aspect of the present invention provides an online conference method based on synchronization of a voice signal for delivering a voice signal generated by a speaker to a listener.

본 발명의 다른 측면은 발화자에 의해 발생된 음성 신호를 청취자에게 전달하기 위한 온라인 회의를 위한 음성 신호 동기화 처리 장치를 제공한다.Another aspect of the present invention provides an apparatus for synchronizing a voice signal for on-line conference for delivering a voice signal generated by a speaker to a listener.

본 발명의 일 측면에 따른 음성 신호의 동기화를 기반으로 한 온라인 회의 방법은 음성 신호 동기화 처리 장치가 발화자 단말로부터 소리 신호 및 상기 소리 신호의 발화 시점에 대한 정보를 포함하는 제1 동기화 신호를 수신하고, 상기 음성 신호 동기화 처리 장치가 상기 소리 신호의 후처리를 기반으로 상기 소리 신호에 포함된 발화자의 상기 음성 신호를 추출하고, 상기 음성 신호 동기화 처리 장치가 상기 음성 신호 및 상기 제1 동기화 신호 기반으로 결정된 제2 동기화 신호를 청취자 단말로 전송할 수 있되, 상기 제2 동기화 신호는 상기 청취자 단말에서 상기 음성 신호의 재생 시간을 결정할 수 있다.An on-line conference method based on synchronization of a speech signal according to an aspect of the present invention is characterized in that the speech signal synchronization processing apparatus receives a first synchronization signal including a sound signal from the speaker terminal and information about a point of time of utterance of the sound signal Wherein the voice signal synchronization processing device extracts the voice signal of a speaker included in the voice signal based on a post-process of the voice signal, and the voice signal synchronization processing device extracts the voice signal based on the voice signal and the first synchronization signal And transmit the determined second synchronization signal to the listener terminal, and the second synchronization signal may determine the playback time of the voice signal at the listener terminal.

한편, 상기 후처리는 상기 소리 신호에서 외부 잡음 신호를 제거하고, 상기 소리 신호에서 상기 발화자의 음성 특성 정보와 매칭되는 매칭 음성 신호를 상기 음성 신호로서 추출하는 것일 수 있다.The post-processing may be to remove an external noise signal from the sound signal, and extract a matched speech signal matching the speech characteristic information of the speaker from the sound signal as the speech signal.

또한, 상기 소리 신호에서 외부 잡음 신호를 제거하는 것은 상기 소리 신호에서 사람의 음성 신호가 아닌 신호를 제거하는 것일 수 있다.In addition, removing the external noise signal from the sound signal may be to remove a signal other than a human sound signal from the sound signal.

또한, 상기 음성 특성 정보는 상기 발화자의 음성의 주파수 대역, 진폭, 발화 특성 정보를 포함할 수 있다.In addition, the voice characteristic information may include frequency band, amplitude, and speech characteristic information of the speech of the speaker.

또한, 상기 음성 신호 동기화 처리 장치는 복수의 단말 각각으로부터 진폭의 크기를 제외한 나머지 음성 특성 정보가 동일한 소리 신호를 수신한 경우, 상기 복수의 단말 중 가장 큰 상기 진폭의 크기를 가지는 소리 신호를 수신한 단말을 상기 발화자 단말로 결정하고 나머지 단말을 청취자 단말로 결정할 수 있다.When the voice signal synchronization processing apparatus receives a voice signal having the same voice characteristic information except the magnitude amplitude from each of the plurality of terminals, the voice signal synchronization processing apparatus receives the voice signal having the largest amplitude among the plurality of terminals The terminal may be determined as the speaking terminal and the remaining terminals may be determined as the listener terminal.

또한, 상기 음성 신호 동기화 처리 장치는 가장 큰 상기 진폭의 크기를 가지는 소리 신호의 음성 특성을 분석하여 상기 발화자의 음성 특성 정보로서 상기 발화자 단말에 매칭하여 저장할 수 있다.In addition, the apparatus for synchronizing the voice signal may analyze the voice characteristic of the voice signal having the largest magnitude and store the voice characteristic information of the speaker as the voice characteristic information.

또한, 상기 음성 신호 동기화 처리 장치는 상기 발화자 단말 및 다른 발화자 단말 각각으로부터 동일한 시간 자원 상에서 상기 소리 신호 및 다른 소리 신호를 수신한 경우, 상기 소리 신호 및 상기 다른 소리 신호 각각의 시간 동기를 조정하고, 상기 제2 동기화 신호는 상기 소리 신호에 대한 조정된 시간 동기를 포함할 수 있다.In addition, the apparatus for synchronizing the voice signal may further comprise means for adjusting time synchronization of the sound signal and the other sound signals, respectively, when the sound signal and the other sound signals are received from the speaker terminal and another speaker terminal on the same time resource, The second synchronization signal may comprise an adjusted time synchronization for the sound signal.

또한, 상기 소리 신호의 시작점이 상기 다른 소리 신호의 시작점보다 빠른 경우, 상기 소리 신호의 시간 동기를 상기 소리 신호의 이전 소리 신호를 고려하여 이전 시간으로 조정하고, 상기 소리 신호의 시작점이 상기 다른 소리 신호의 시작점보다 느린 경우, 상기 소리 신호의 시간 동기를 상기 소리 신호의 이후 소리 신호를 고려하여 이후 시간으로 조정할 수 있다.If the start point of the sound signal is earlier than the start point of the other sound signal, time synchronization of the sound signal is adjusted to a previous time in consideration of the previous sound signal of the sound signal, The time synchronization of the sound signal may be adjusted to a later time in consideration of a sound signal of the sound signal after the start of the sound signal.

또한, 상기 음성 신호 동기화 처리 장치는 상기 음성 신호를 텍스트 정보로 변환하고, 상기 텍스트 정보를 기반으로 상기 음성 신호를 복수의 하위 음성 신호 단위로 분할하고, 상기 텍스트 정보를 기반으로 상기 복수의 하위 음성 신호 단위 각각에 태그 정보를 할당할 수 있다.The apparatus for synchronizing a voice signal may further include a memory unit configured to convert the voice signal into text information, divide the voice signal into a plurality of lower voice signal units on the basis of the text information, Tag information can be assigned to each signal unit.

또한, 온라인 회의 서비스를 제공하기 위한, 컴퓨터 프로그램이 기록된 컴퓨터로 판단 가능한 기록 매체일 수 있다.Further, it may be a computer-readable recording medium on which a computer program for providing an online conference service is recorded.

본 발명의 다른 측면에 따른 온라인 회의를 위한 음성 신호 동기화 처리 장치로서, 상기 음성 신호 동기화 처리 장치는 프로세서를 포함하고, 상기 프로세서는 발화자 단말로부터 소리 신호 및 상기 소리 신호의 발화 시점에 대한 정보를 포함하는 제1 동기화 신호를 수신하고, 상기 소리 신호의 후처리를 기반으로 상기 소리 신호에 포함된 발화자의 상기 음성 신호를 추출하고, 상기 음성 신호 및 상기 제1 동기화 신호 기반으로 결정된 제2 동기화 신호를 청취자 단말로 전송하도록 구현될 수 있되, 상기 제2 동기화 신호는 상기 청취자 단말에서 상기 음성 신호의 재생 시간을 결정할 수 있다. According to another aspect of the present invention, there is provided an apparatus for synchronizing a voice signal for on-line conference, wherein the apparatus for synchronizing the voice signal includes a processor and the processor includes information on a sound signal and a point- Extracting the speech signal of the speaker included in the sound signal based on the post-processing of the sound signal, and generating a second synchronization signal based on the speech signal and the first synchronization signal To the listener terminal, and the second synchronization signal may determine the playback time of the voice signal at the listener terminal.

한편, 상기 후처리는 상기 소리 신호에서 외부 잡음 신호를 제거하고, 상기 소리 신호에서 상기 발화자의 음성 특성 정보와 매칭되는 매칭 음성 신호를 상기 음성 신호로서 추출하는 것일 수 있다. The post-processing may be to remove an external noise signal from the sound signal, and extract a matched speech signal matching the speech characteristic information of the speaker from the sound signal as the speech signal.

또한, 상기 소리 신호에서 외부 잡음 신호를 제거하는 것은 상기 소리 신호에서 사람의 음성 신호가 아닌 신호를 제거하는 것일 수 있다. In addition, removing the external noise signal from the sound signal may be to remove a signal other than a human sound signal from the sound signal.

또한, 상기 음성 특성 정보는 상기 발화자의 음성의 주파수 대역, 진폭, 발화 특성 정보를 포함할 수 있다. In addition, the voice characteristic information may include frequency band, amplitude, and speech characteristic information of the speech of the speaker.

또한, 상기 프로세서는 복수의 단말 각각으로부터 진폭의 크기를 제외한 나머지 음성 특성 정보가 동일한 소리 신호를 수신한 경우, 상기 복수의 단말 중 가장 큰 상기 진폭의 크기를 가지는 소리 신호를 수신한 단말을 상기 발화자 단말로 결정하고 나머지 단말을 청취자 단말로 결정할 수 있다. In addition, when the processor receives a sound signal having the same voice characteristic information except the magnitude of amplitude from each of the plurality of terminals, the terminal receives the sound signal having the largest amplitude of the plurality of terminals, The terminal can be determined and the remaining terminals can be determined to be the listener terminal.

또한, 상기 프로세서는 가장 큰 상기 진폭의 크기를 가지는 소리 신호의 음성 특성을 분석하여 상기 발화자의 음성 특성 정보로서 상기 발화자 단말에 매칭하여 저장할 수 있다. Also, the processor may analyze the voice characteristic of the sound signal having the largest magnitude and store the voice characteristic information of the speaker as the voice characteristic information.

또한, 상기 프로세서는 상기 발화자 단말 및 다른 발화자 단말 각각으로부터 동일한 시간 자원 상에서 상기 소리 신호 및 다른 소리 신호를 수신한 경우, 상기 소리 신호 및 상기 다른 소리 신호 각각의 시간 동기를 조정하고, 상기 제2 동기화 신호는 상기 소리 신호에 대한 조정된 시간 동기를 포함할 수 있다. The processor may adjust the time synchronization of the sound signal and the different sound signal when the sound signal and the different sound signal are received from the talker terminal and the other talker terminal on the same time resource, The signal may comprise an adjusted time synchronization for the sound signal.

또한, 상기 프로세서는 상기 소리 신호의 시작점이 상기 다른 소리 신호의 시작점보다 빠른 경우, 상기 소리 신호의 시간 동기를 상기 소리 신호의 이전 소리 신호를 고려하여 이전 시간으로 조정하고, 상기 소리 신호의 시작점이 상기 다른 소리 신호의 시작점보다 느린 경우, 상기 소리 신호의 시간 동기를 상기 소리 신호의 이후 소리 신호를 고려하여 이후 시간으로 조정할 수 있다. The processor adjusts the time synchronization of the sound signal to a previous time in consideration of the previous sound signal of the sound signal when the start point of the sound signal is earlier than the start point of the other sound signal, The time synchronization of the sound signal may be adjusted to a later time in consideration of a sound signal of the sound signal after the start of the other sound signal.

또한, 상기 프로세서는 상기 음성 신호 동기화 처리 장치는 상기 음성 신호를 텍스트 정보로 변환하고, 상기 텍스트 정보를 기반으로 상기 음성 신호를 복수의 하위 음성 신호 단위로 분할하고, 상기 텍스트 정보를 기반으로 상기 복수의 하위 음성 신호 단위 각각에 태그 정보를 할당할 수 있다. The processor may be configured such that the apparatus for synchronizing voice signals converts the voice signal into text information, divides the voice signal into a plurality of lower voice signal units on the basis of the text information, It is possible to assign tag information to each of the lower speech signal units of the speech signal unit.

또한, 상기 태그 정보는 상기 텍스트 정보에 포함된 단어의 사용 빈도를 기반으로 결정될 수 있다.The tag information may be determined based on a frequency of use of words included in the text information.

상술한 본 발명의 일측면에 따르면, 온라인 회의 상에서 발화자에 의해 발생되는 음성 신호를 정확하게 동기화하여 청취자에게 전달할 수 있다. 또한, 온라인 회의 상에서 발생되는 음성 신호의 시간 동기를 조절하여 청취자가 발화자의 음성 신호를 명확하게 전달받을 수 있다. 이뿐만 아니라, 청취자가 연속적인 음성 신호 중 특정한 음성 신호를 탐색하기 쉽도록 연속적인 음성 신호를 분석하여 복수의 하위 음성 신호 단위로 분할하여 청취자에게 제공할 수 있다.According to an aspect of the present invention, a voice signal generated by a speaker on an online conference can be accurately synchronized and transmitted to a listener. In addition, the time synchronization of the voice signal generated on the online conference is adjusted, so that the listener can receive the voice signal of the speaker clearly. In addition, the listener can analyze a continuous speech signal so as to easily search for a specific speech signal among the continuous speech signals, and divide the speech signal into a plurality of lower speech signal units and provide the divided speech signal to the listener.

도 1은 본 발명의 실시예에 따른 온라인 회의에서 음성 정보의 동기화 방법을 나타낸 개념도이다.
도 2는 본 발명의 실시예에 따른 음성 신호 처리 방법을 나타낸 개념도이다.
도 3은 도2에 도시된 음성 신호 동기화 처리 장치에서 발화자의 음성 신호를 결정하는 방법을 나타낸 개념도이다.
도 4는 도 2에 도시된 음성 신호 동기화 처리 장치에서 음성 신호의 중첩 대역의 처리 방법을 나타낸 개념도이다.
도 5는 도 2에 도시된 음성 신호 동기화 처리 장치의 음성 신호 처리 방법을 나타낸 개념도이다.
도 6은 도 2에 도시된 음성 신호 동기화 처리 장치의 음성 신호 처리 방법을 나타낸 개념도이다.
도 7은 도 2에 도시된 음성 신호 동기화 처리 장치의 음성 처리 방법을 나타낸 개념도이다.
도 8은 도 2에 도시된 음성 신호 동기화 처리 장치를 나타낸 블록도이다. 1 is a conceptual diagram illustrating a method of synchronizing voice information in an online conference according to an embodiment of the present invention.
2 is a conceptual diagram illustrating a method of processing a speech signal according to an embodiment of the present invention.
3 is a conceptual diagram illustrating a method for determining a voice signal of a speaking person in the voice signal synchronization processing apparatus shown in FIG.
FIG. 4 is a conceptual diagram illustrating a method of processing a superimposed band of a voice signal in the apparatus for synchronizing voice signals shown in FIG. 2. FIG.
5 is a conceptual diagram illustrating a method of processing a speech signal in the apparatus for synchronizing the speech signal shown in FIG.
6 is a conceptual diagram illustrating a method of processing a speech signal in the speech signal synchronization processing apparatus shown in FIG.
FIG. 7 is a conceptual diagram showing a speech processing method of the apparatus for synchronizing the speech signal shown in FIG. 2. FIG.
8 is a block diagram showing the apparatus for synchronizing the voice signal shown in Fig.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The following detailed description of the invention refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the position or arrangement of the individual components within each disclosed embodiment may be varied without departing from the spirit and scope of the invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is to be limited only by the appended claims, along with the full scope of equivalents to which such claims are entitled, if properly explained. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 온라인 회의에서 음성 정보의 동기화 방법을 나타낸 개념도이다.1 is a conceptual diagram illustrating a method of synchronizing voice information in an online conference according to an embodiment of the present invention.

도 1에서는 온라인 회의 상에서 발생하는 음성 정보가 발화자 단말(110) 및 청취자 단말(120)에서 동기화되는 방법에 대해 개시한다. 1, a method of synchronizing voice information generated on an online conference with a speaker terminal 110 and a listener terminal 120 will be described.

도 1을 참조하면, 발화자에 의해 발생된 음성 신호는 동기화 신호와 함께 동기화 서버, 미디어 서버로 전송될 수 있다. 동기화 신호는 음성 신호의 발화 시간에 대한 정보를 포함할 수 있다. 본 발명의 실시예에 따르면, 동기화 신호를 기반으로 한 음성 신호가 시간 축 상에서 정렬되어 처리될 수 있다. 이하, 본 발명의 실시예에서는 발화자 단말(110)에서 동기화 신호가 생성되는 것을 가정하나, 동기화 서버(100)에서 수신한 음성 신호에 대한 동기화 신호가 생성될 수도 있다.Referring to FIG. 1, a voice signal generated by a speaker can be transmitted to a synchronization server and a media server together with a synchronization signal. The synchronization signal may include information on the speech time of the speech signal. According to the embodiment of the present invention, the voice signal based on the synchronization signal can be processed and aligned on the time axis. In the embodiment of the present invention, a synchronization signal for the voice signal received by the synchronization server 100 may be generated, while a synchronization signal is generated in the speaker terminal 110. [

동기화 서버(100)는 발화자 단말(110)에 의해 생성된 음성 신호에 매칭되는 동기화 신호를 기반으로 수신한 음성 신호에 대한 동기화를 수행하여 청취자 단말(120)로 전송하기 위해 구현될 수 있다.The synchronization server 100 may be implemented to synchronize the received voice signal based on the synchronization signal matched to the voice signal generated by the speaker terminal 110 and transmit the synchronization to the listener terminal 120.

미디어 서버(150)는 발화자 단말로부터 수신한 소리 신호를 처리하여 청취자 단말(120)로 전달하기 위한 서버일 수 있다. 발화자 단말(110)로부터 수신한 소리 신호는 발화자의 음성 신호뿐만 아니라 외부 잡음 신호를 포함할 수 있다. 미디어 서버(150)는 발화자 단말(110)로부터 수신한 소리 신호에서 발화자의 음성 신호만을 추출할 수 있다. 이에 대해서는 후술한다.The media server 150 may be a server for processing the sound signal received from the speaker terminal and delivering it to the listener terminal 120. The sound signal received from the speaker terminal 110 may include an external noise signal as well as a voice signal of the speaker. The media server 150 can extract only the voice signal of the speaker from the sound signal received from the speaking terminal 110. [ This will be described later.

동기화 서버(100) 및 미디어 서버(150)는 청취자 단말(120)로 동기화 신호 및 음성 신호를 매칭하여 전송할 수 있다. 청취자 단말(120)로 전송되는 동기화 신호는 어떠한 음성 신호에 대한 동기화 신호인지 여부를 나타내는 식별 정보를 포함할 수 있다. 또한, 청취자 단말(120)로 전송되는 음성 신호는 음성 신호와 매칭되는 동기화 신호에 대한 식별 정보가 포함될 수 있다. 즉, 음성 신호와 동기화 신호는 상호 매칭되어 청취자 단말(120)로 전송될 수 있다. 또는 동기화 서버(100) 및 미디어 서버(150)는 청취자 단말(120)로 음성 신호를 전송하되, 전송되는 음성 신호에 동기화 신호가 포함될 수도 있다.The synchronization server 100 and the media server 150 may match and transmit the synchronization signal and the voice signal to the listener terminal 120. The synchronization signal transmitted to the listener terminal 120 may include identification information indicating whether the synchronization signal is for a voice signal. In addition, the voice signal transmitted to the listener terminal 120 may include identification information on a synchronization signal matching the voice signal. That is, the voice signal and the synchronization signal may be matched to each other and transmitted to the listener terminal 120. Alternatively, the synchronization server 100 and the media server 150 may transmit a voice signal to the listener terminal 120, and the voice signal to be transmitted may include a synchronization signal.

청취자 단말(120)은 동기화 신호를 기반으로 수신한 음성 신호가 재생될 시간축 상의 위치를 판단할 수 있다. 구체적으로 청취자 단말(120)은 동기화 신호를 기반으로 수신한 음성 신호의 재생 시간을 결정할 수 있고, 청취자 단말(120)은 해당 음성 신호를 결정된 재생 시간에서 재생할 수 있다.The listener terminal 120 can determine the position on the time axis on which the voice signal received based on the synchronization signal is to be reproduced. Specifically, the listener terminal 120 can determine the reproduction time of the voice signal received based on the synchronization signal, and the listener terminal 120 can reproduce the voice signal at the determined reproduction time.

이하, 동기화 서버(100), 미디어 서버(150) 등과 같은 특정 사용자 단말로부터 수신한 소리 신호를 처리하고 동기화하여 적어도 하나의 다른 사용자 단말로 전송하기 위한 장치는 음성 신호 동기화 처리 장치(200)라는 용어로 표현될 수 있다.Hereinafter, an apparatus for processing and synchronizing a sound signal received from a specific user terminal, such as the synchronization server 100, the media server 150, and the like, to at least one other user terminal will be referred to as a voice signal synchronization processing apparatus 200 . &Lt; / RTI >

즉, 음성 신호 동기화 처리 장치(200)는 발화자 단말(110)로부터 소리 신호 및 소리 신호의 발화 시점에 대한 정보를 포함하는 제1 동기화 신호를 수신할 수 있다. 음성 신호 동기화 처리 장치(200)는 소리 신호에 대한 후처리를 기반으로 소리 신호에 포함된 발화자의 음성 신호를 추출할 수 있다. 또한, 음성 신호 동기화 처리 장치(200)는 음성 신호 및 제1 동기화 신호 기반으로 결정된 제2 동기화 신호를 청취자 단말(120)로 전송하되, 제2 동기화 신호는 청취자 단말(120)에서 음성 신호의 재생 시간을 결정할 수 있다.
That is, the apparatus for synchronizing the voice signal 200 can receive a first synchronization signal including information on a sound signal and a point of time of a sound signal from the speaking terminal 110. The apparatus for synchronizing the voice signal 200 can extract the voice signal of the speaker included in the sound signal based on the post-processing of the sound signal. In addition, the apparatus 200 for synchronizing the voice signal transmits a second synchronization signal determined based on the voice signal and the first synchronization signal to the listener terminal 120, and the second synchronization signal is transmitted to the listener terminal 120, Time can be determined.

도 2는 본 발명의 실시예에 따른 음성 신호 처리 방법을 나타낸 개념도이다. 2 is a conceptual diagram illustrating a method of processing a speech signal according to an embodiment of the present invention.

도 2에서는 음성 신호 동기화 처리 장치(200)가 발화자 단말로부터 수신한 소리 신호에서 외부 잡음 신호를 제거하는 방법에 대해 개시한다.2, a method of removing the external noise signal from the sound signal received from the speaking terminal is described.

구체적으로 음성 신호 동기화 처리 장치(200)는 특정 발화자 단말로부터 수신한 소리 신호 중 외부 잡음 신호에 해당하는 부분을 제거하여 발화자의 음성 신호만을 추출하는 후 처리를 수행할 수 있다. 외부 잡음 신호는 발화자 단말에 의해 전송된 소리 신호에서 발화자의 음성 신호를 제외한 소리 신호일 수 있다.Specifically, the speech signal synchronization processing apparatus 200 may perform a post-processing for extracting only a speech signal of a speaking person by removing a portion corresponding to an external noise signal among the speech signals received from the specific speaking terminal. The external noise signal may be a sound signal excluding the speech signal of the speaker from the sound signal transmitted by the speaker terminal.

온라인 회의가 수행되고 발화자 단말에 의해 전송된 소리 신호에 포함된 외부 잡음 신호의 크기가 일정량 이상인 경우, 발화자 음성에 대한 청취자의 인식도가 낮아질 수 있다. 또한, 주변의 소음이 아닌 복수의 발화자 간의 음성 신호가 중첩되는 경우, 복수의 발화자 각각의 음성에 대한 청취자의 인식도가 낮아질 수도 있다. 따라서, 음성 신호 동기화 처리 장치(200)는 발화자 단말로 수신한 소리 신호 중 이러한 외부 잡음 신호를 제거할 필요가 있다.When the size of the external noise signal included in the sound signal transmitted by the speaker terminal is equal to or greater than a predetermined amount, the listener's perception of the speaker's voice may be lowered. In addition, when voice signals between a plurality of speakers, which are not surrounding noises, overlap each other, the listener's perception of the voice of each of a plurality of speakers may be lowered. Therefore, the apparatus for synchronizing the voice signal 200 needs to remove the external noise signal among the sound signals received by the speaker terminal.

음성 신호 동기화 처리 장치(200)는 수신한 소리 신호 중 발화자의 음성 신호만을 추출하기 위해 발화자의 음성 특성 정보를 이용할 수 있다. 예를 들어, 음성 신호 동기화 처리 장치(200)는 수신한 소리 신호에서 가장 큰 음량을 가진 소리 신호(또는 가장 큰 진폭을 가진 소리 신호)를 발화자의 음성 신호로 결정할 수 있다. 음성 신호 동기화 처리 장치(200)는 결정된 발화자의 음성 신호에 대한 분석을 기반으로 발화자의 음성 신호의 특성 정보(신호의 진폭, 신호의 주파수 등)을 추출할 수 있다. 음성 신호 동기화 처리 장치(200)는 추출된 음성 신호의 특성 정보를 기반으로 소리 신호 중 발화자의 음성 신호에 해당하는 신호만을 추출하고 그 외의 신호는 외부 잡음 신호로 인식하여 필터링할 수 있다.The voice signal synchronization processing apparatus 200 can use the voice characteristic information of the speaker to extract only the voice signal of the speaker among the received voice signals. For example, the voice signal synchronization processing apparatus 200 can determine a voice signal having the largest voice volume (or a voice signal having the largest amplitude) as the voice signal of the voice signal from the received voice signal. The speech signal synchronization processing apparatus 200 can extract characteristic information (amplitude of a signal, frequency of a signal, etc.) of a speech signal of a speaking person based on analysis of the speech signal of the determined speaker. The voice signal synchronization processing apparatus 200 can extract only a signal corresponding to a voice signal of a speaker among sound signals based on the extracted characteristic information of the voice signal and recognize the other signals as an external noise signal and perform filtering.

음성 신호 동기화 처리 장치(200)는 발화자 단말을 통해 입력된 소리 신호 중 발화자의 음성 신호를 결정하기 위해 다양한 판단을 수행할 수 있다. The apparatus for synchronizing the voice signal 200 can perform various judgments to determine the voice signal of the speaker among the sound signals inputted through the speaker terminal.

도 3은 도2에 도시된 음성 신호 동기화 처리 장치에서 발화자의 음성 신호를 결정하는 방법을 나타낸 개념도이다.3 is a conceptual diagram illustrating a method for determining a voice signal of a speaking person in the voice signal synchronization processing apparatus shown in FIG.

도 3에서는 음성 신호 동기화 처리 장치(200)가 타인의 음성 신호를 발화자의 음성 신호로 인식하지 않는 방법에 대해 개시한다.3, a description will be given of a method in which the apparatus for synchronizing the voice signal 200 does not recognize a voice signal of another person as a voice signal of a speaker.

도 3을 참조하면, 발화자가 음성 신호를 생성하는 경우, 발화자 단말에 음성 신호가 입력되어 음성 신호 동기화 처리 장치(200)로 전송될 뿐만 아니라, 발화자 주변의 청취자의 단말로도 발화자의 음성 신호가 입력될 수 있다. 이러한 경우, 음성 신호 동기화 처리 장치(200)는 복수의 단말로부터 소리 신호를 수신할 수 있다.3, when a speaker generates a voice signal, not only a voice signal is inputted to the speaker terminal and is transmitted to the voice signal synchronization processing apparatus 200, but also the voice signal of the speaker is transmitted to the terminal of the listener around the speaker Can be input. In this case, the apparatus for synchronizing voice signal 200 can receive a sound signal from a plurality of terminals.

구체적인 예로, 발화자의 발화로 인해 발화자 단말뿐만 아니라 발화자에 인접한 청취자 2명의 청취자 단말(제1 청취자 단말(310) 및 제2 청취자 단말(320))로 소리 신호가 유입된 경우를 가정할 수 있다. 음성 신호 동기화 처리 장치(200)는 발화자 단말(300), 제1 청취자 단말(310) 및 제2 청취자 단말(320)로부터 소리 신호를 수신할 수 있다. 이러한 경우, 음성 신호 동기화 처리 장치(200)는 가장 큰 진폭을 가진 소리를 전송한 단말을 발화자 단말(300)로 인식하고, 나머지 상대적으로 작은 진폭을 가진 소리를 전송한 단말은 발화자의 단말이 아닌 청취자 단말(310, 320)로 인식할 수 있다. As a concrete example, it can be assumed that a sound signal is input to the listener terminals (the first listener terminal 310 and the second listener terminal 320) of two listeners adjacent to the speaker, as well as the speaker terminal due to the speaker's utterance. The voice signal synchronization processing apparatus 200 can receive a sound signal from the speaker terminal 300, the first listener terminal 310 and the second listener terminal 320. [ In this case, the apparatus for synchronizing the speech signal 200 recognizes the terminal that has transmitted the sound having the largest amplitude as the speaking terminal 300, and the terminal that transmits the sound having the remaining relatively small amplitude is not the speaking terminal And can be recognized by the listener terminals 310 and 320.

이러한 정보를 기반으로 음성 신호 동기화 처리 장치(200)는 발화자의 음성 특성 정보와 발화자 단말을 매칭시켜, 발화자 단말(300)에 의해 입력되는 소리 신호 중 발화자의 음성 신호만을 추출할 수 있다. 반대로 이후에 청취자 단말(310, 320)에 입력되는 현재의 발화자의 음성 신호는 다른 사람의 음성 신호로 인식되어 제거될 수도 있다.Based on this information, the apparatus for synchronizing the voice signal 200 can extract only the voice signal of the speaker among the voice signals input by the speaking terminal 300 by matching the voice characteristic information of the speaker with the speaker terminal. Conversely, the voice signal of the current speaker input to the listener terminals 310 and 320 may be recognized as a voice signal of another person and may be removed.

도 4는 도 2에 도시된 음성 신호 동기화 처리 장치에서 음성 신호의 중첩 대역의 처리 방법을 나타낸 개념도이다. FIG. 4 is a conceptual diagram illustrating a method of processing a superimposed band of a voice signal in the apparatus for synchronizing voice signals shown in FIG. 2. FIG.

도 4에서는 음성 신호 동기화 처리 장치가 복수의 발화자의 음성 신호를 동시에 인식하여 처리하는 방법에 대해 개시한다.Fig. 4 shows a method of simultaneously recognizing and processing speech signals of a plurality of utterances by the speech signal synchronization processing device.

회의장에서는 복수의 발화자에 의해 동시에 음성 신호가 생성될 수 있다. 예를 들어, 제1 발화자가 제1 시간 대역에서 제1 음성 신호를 생성하고, 제2 발화자가 제2 시간 대역에서 제2 음성 신호를 생성하되, 제1 시간 대역과 제2 시간 대역이 중첩될 수 있다. 제1 시간 대역과 제2 시간 대역 중 제1 음성 신호와 제2 음성 신호가 중첩되는 대역을 중첩 대역이라는 용어로 표현한다.In the conference hall, a plurality of speech actors can simultaneously generate speech signals. For example, if a first speaker generates a first speech signal in a first time band and a second speaker generates a second speech signal in a second time band, wherein the first and second time bands overlap . A band in which the first audio signal and the second audio signal overlap in the first time band and the second time band is represented by the term " overlapping band ".

이러한 경우, 음성 신호 동기화 처리 장치(200)는 중첩 대역에서 제1 음성 신호와 제2 음성 신호 각각을 처리할 수 있고, 중첩 대역에서 제1 음성 신호 및 제2 음성 신호가 모두 출력될 수 있다. 음성 신호 동기화 처리 장치(200)는 제1 음성 신호와 제2 음성 신호를 멀티플렉싱하여 처리할 수 있고, 청취자 단말은 멀티플렉싱된 음성 신호를 수신하여 제1 음성 신호와 제2 음성 신호를 모두 청취할 수 있다. In this case, the apparatus for synchronizing the voice signal 200 can process the first voice signal and the second voice signal in the overlapping band, and both the first voice signal and the second voice signal can be output in the overlapping band. The apparatus for synchronizing the voice signal 200 can multiplex and process the first and second voice signals and the listener terminal can receive the multiplexed voice signal to listen to both the first voice signal and the second voice signal have.

음성 신호 동기화 처리 장치(200)는 제1 음성 신호와 제2 음성 신호를 멀티플렉싱시, 제1 음성 신호와 제2 음성 신호에 대한 후 처리를 수행할 수 있다. The audio signal synchronization processing apparatus 200 may perform post-processing on the first audio signal and the second audio signal when multiplexing the first audio signal and the second audio signal.

예를 들어, 음성 신호 동기화 처리 장치(200)는 제1 음성 신호와 제2 음성 신호의 음량의 크기를 제어할 수 있다. 제1 음성 신호의 음량이 제2 음성 신호의 음량보다 상대적으로 너무 큰 경우, 음성 신호 동기화 처리 장치(200)는 제1 음성 신호의 음량을 줄이고, 제2 음성 신호의 음량을 높여서 제1 음성 신호와 제2 음성 신호 간의 음량의 비율을 조정할 수 있다. For example, the apparatus for synchronizing the voice signal 200 can control the magnitude of the volume of the first voice signal and the second voice signal. When the volume of the first audio signal is relatively larger than the volume of the second audio signal, the apparatus 200 for synchronizing the audio signal processing apparatus reduces the volume of the first audio signal, raises the volume of the second audio signal, And the second audio signal can be adjusted.

또는 음성 신호 동기화 처리 장치(200)는 제1 음성 신호와 제2 음성 신호의 중첩 대역이 최소가 되도록 제1 음성 신호와 제2 음성 신호 중 적어도 하나의 음성 신호의 시간 동기를 조정할 수 있다. Or the audio signal synchronization processing apparatus 200 may adjust the time synchronization of at least one of the first audio signal and the second audio signal so that the overlapping band of the first audio signal and the second audio signal is minimized.

도 5는 도 2에 도시된 음성 신호 동기화 처리 장치의 음성 신호 처리 방법을 나타낸 개념도이다. 5 is a conceptual diagram illustrating a method of processing a speech signal in the apparatus for synchronizing the speech signal shown in FIG.

도 5에서는 복수의 발화자에 의해 생성된 음성 신호의 중첩으로 인해 중첩 대역이 생기는 경우, 중첩 대역에서 음성 신호의 처리 방법이 개시된다.In Fig. 5, a method for processing a speech signal in an overlapping band when overlapping bands occur due to superposition of speech signals generated by a plurality of speakers.

도 5를 참조하면, 제1 발화자가 제1 시간 대역에서 제1 음성 신호를 생성하고, 제2 발화자가 제2 시간 대역에서 제2 음성 신호를 생성하되, 제1 시간 대역과 제2 시간 대역이 중첩되어 중첩 대역이 생성될 수 있다. 5, a first speaker generates a first speech signal in a first time band and a second speaker generates a second speech signal in a second time band, wherein a first time zone and a second time zone Overlapping bands can be generated.

음성 신호 동기화 처리 장치는 제1 음성 신호와 제2 음성 신호의 중첩 대역이 최소가 되도록 제1 음성 신호와 제2 음성 신호의 시간 동기를 조정할 수 있다.The apparatus for synchronizing the voice signal can adjust the time synchronization between the first voice signal and the second voice signal so that the overlapping band of the first voice signal and the second voice signal is minimized.

예를 들어, 제1 음성 신호의 시작 지점이 제2 음성 신호보다 빠른 경우를 가정할 수 있다. 음성 신호 동기화 처리 장치는 제1 음성 신호의 이전에 위치한 이전 음성 신호의 위치를 고려하여 제1 음성 신호의 동기를 일부 앞으로 조정할 수 있다. For example, it can be assumed that the start point of the first speech signal is faster than the second speech signal. The apparatus for synchronizing the voice signal can adjust some of the synchronization of the first voice signal in consideration of the position of the previous voice signal located before the first voice signal.

예를 들어, 제1 음성 신호와 이전 음성 신호 사이의 시간 인터벌이 1초인 경우, 제1 음성 신호의 시간 동기는 0.5초 앞서는 것으로 조정될 수 있다. 이러한 경우, 제1 음성 신호와 이전 음성 신호 사이의 시간 인터벌이 0.5초가 되고, 제1 음성 신호와 제2 음성 신호의 중첩 대역은 0.5초만큼 줄어들 수 있다. For example, when the time interval between the first audio signal and the previous audio signal is 1 second, the time synchronization of the first audio signal may be adjusted to be 0.5 seconds earlier. In this case, the time interval between the first voice signal and the previous voice signal becomes 0.5 seconds, and the overlapping band of the first voice signal and the second voice signal can be reduced by 0.5 seconds.

반대로 제2 음성 신호와 제2 음성 신호의 이후 음성 신호 사이의 시간 인터벌이 2초인 경우, 제2 음성 신호의 동기는 1초 뒤로 조정될 수 있다. 이러한 경우, 제2 음성 신호와 이후 음성 신호 사이의 시간 인터벌이 1초가 되고, 제1 음성 신호와 제2 음성 신호의 중첩 대역은 1초만큼 줄어들 수 있다.On the contrary, when the time interval between the second audio signal and the subsequent audio signal of the second audio signal is 2 seconds, the synchronization of the second audio signal can be adjusted backward by one second. In this case, the time interval between the second audio signal and the subsequent audio signal becomes 1 second, and the overlapping band of the first audio signal and the second audio signal can be reduced by one second.

위와 같은 중첩 대역에서 음성 신호의 시간 동기에 대한 조정이 수행되는 경우, 만약, 제1 음성 신호와 제2 음성 신호의 중첩 대역이 2초였다면, 중첩 대역은 2초에서 1.5초가 감소된 0.5초일 수 있다. 즉, 복수개의 중첩 신호가 최대한 스프레딩(spreading)되고 복수의 발화자 각각에 의한 음성 신호의 인식도가 증가될 수 있다. If the overlapping band of the first and second audio signals is 2 seconds, if the adjustment of the time synchronization of the audio signal in the overlapping band is performed, the overlapping band may be 0.5 second have. That is, a plurality of superimposed signals are spreading as much as possible, and the degree of recognition of a voice signal by each of a plurality of utterances can be increased.

도 6은 도 2에 도시된 음성 신호 동기화 처리 장치의 음성 신호 처리 방법을 나타낸 개념도이다.6 is a conceptual diagram illustrating a method of processing a speech signal in the speech signal synchronization processing apparatus shown in FIG.

도 6에서는 음성 신호를 기반으로 회의 기록을 생성하는 방법에 대해 개시한다. 6 shows a method for generating a conference record based on a voice signal.

도 6을 참조하면, 음성 신호 동기화 처리 장치(200)는 음성 신호를 기반으로 회의록을 자동적으로 생성할 수 있다. 음성 신호 동기화 처리 장치(200)는 음성 분석을 통해서 음성 신호를 텍스트화할 수 있다. 텍스트화된 음성 신호는 주제에 따라 분류될 수 있다. Referring to FIG. 6, the apparatus for synchronizing the voice signal 200 can automatically generate a meeting record based on a voice signal. The voice signal synchronization processing apparatus 200 can textize a voice signal through voice analysis. The textualized speech signal may be classified according to the topic.

예를 들어, 복수의 발화자 단말에 의해 발화된 단어의 빈도를 고려하여 회의 기록이 생성될 수 있다. 제1 주제에 대한 회의가 이루어질 경우, 제1 주제에 관련된 다양한 단어들이 발화자에 의해 계속적으로 발화될 수 있다. 마찬가지로 제2 주제에 대한 회의가 이루어질 경우, 제2 주제에 관련된 다양한 단어들이 발화자에 의해 계속적으로 발화될 수 있다.For example, a conference record can be generated in consideration of the frequency of words uttered by a plurality of speaker terminals. When a conference on the first subject is made, various words related to the first subject can be continuously uttered by the speaker. Likewise, when a conference on the second subject is made, various words related to the second subject can be continuously uttered by the speaker.

음성 신호 동기화 처리 장치(200)는 발화된 단어의 사용 빈도를 기반으로 연속적인 음성 신호에서 제1 주제와 관련된 제1 하위 음성 신호 단위를 구분하고, 제2 주제와 관련된 제2 하위 음성 신호 단위를 구분할 수 있다. 또한, 하위 음성 신호 단위 각각에는 사용 빈도가 높은 단어에 대한 태그 정보가 할당될 수 있다.The apparatus for synchronizing the speech signal 200 distinguishes a first lower speech signal unit associated with a first subject in a continuous speech signal based on the frequency of use of uttered words and a second lower speech signal unit associated with a second subject . Tag information for a word having a high frequency of use may be assigned to each lower speech signal unit.

이러한 방법을 사용하는 경우, 음성 신호 동기화 처리 장치(200)에서 연속적인 음성 신호가 태그 정보와 매칭된 복수개의 하위 음성 신호 단위로서 분할될 수 있다.When this method is used, a continuous speech signal in the speech signal synchronization processing apparatus 200 can be divided into a plurality of lower speech signal units matched with the tag information.

음성 신호 동기화 처리 장치는 텍스트화된 음성 신호를 기반으로 연속적인 음성 신호에서 특정 내용에 대한 탐색을 수행할 수도 있다. The speech signal synchronization processing apparatus may perform a search for a specific content in a continuous speech signal based on a textual speech signal.

예를 들어, 온라인 회의의 종료 이후, 청취자가 회의 중 발화되었던 특정 내용에 대한 부분에 대한 복기를 원할 수 있다. 이러한 경우, 사용자는 탐색 텍스트 정보를 음성 신호 동기화 처리 장치에 입력할 수 있다. 탐색 텍스트 정보는 특정 내용과 관련된 단어, 문장 또는 그 주변 맥락 등과 관련된 탐색을 위한 정보일 수 있다.For example, after the end of the online meeting, the listener may want to recapture the portion of the specific content that has been uttered during the meeting. In this case, the user can input the search text information to the voice signal synchronization processing device. The search text information may be information for searching related to a word, a sentence or its surrounding context related to a specific content.

음성 신호 동기화 처리 장치(200)는 사용자로부터 수신한 탐색 텍스트 정보와 텍스트화된 음성 정보를 매칭시켜 가장 유사한 음성 신호를 사용자에게 제공할 수 있다. 이러한 방법을 사용함으로써 사용자는 연속적인 음성 신호 중 필요한 음성 신호만을 선택적으로 추출하여 청취할 수 있다. The apparatus for synchronizing the voice signal 200 can provide the user with the most similar voice signal by matching the search text information received from the user with the textual voice information. By using this method, the user can selectively extract and listen to only necessary voice signals among the continuous voice signals.

도 7은 도 2에 도시된 음성 신호 동기화 처리 장치의 음성 처리 방법을 나타낸 개념도이다.FIG. 7 is a conceptual diagram showing a speech processing method of the apparatus for synchronizing the speech signal shown in FIG. 2. FIG.

도 7에서는 음성 신호 동기화 처리 장치(200)는 발화자 별로 음성 신호를 구분하여 편집한 정보를 청취자 단말로 전송하는 방법에 대해 개시한다.In FIG. 7, the apparatus for synchronizing the voice signal 200 discloses a method of transmitting information edited by dividing a voice signal by a speaker into a listener terminal.

도 7을 참조하면, 청취자가 특정 발화자의 음성 신호만을 편집한 편집 음성 신호를 원하는 경우, 청취자는 특정 발화자의 음성 신호에 대한 정보를 입력할 수 있다. 예를 들어, 청취자는 특정 발화자의 음성 신호의 일부를 녹음하여 음성 신호 동기화 처리 장치(200)에 입력하거나, 전술한 텍스트화된 음성 신호 중 특정 발화자의 음성 신호에 해당하는 부분을 선택하여 음성 신호 동기화 처리 장치(200)에 입력할 수 있다. Referring to FIG. 7, when a listener desires an edited speech signal in which only the speech signal of a specific speaker is edited, the listener can input information on the speech signal of the specific speaker. For example, the listener can record a part of a speech signal of a specific speaker and input it to the apparatus 200 for processing a sound signal synchronization, or select a part corresponding to a speech signal of a specific speaker from the above- Can be input to the synchronization processing apparatus 200.

음성 신호 동기화 처리 장치(200)는 특정 발화자의 음성 신호에 대한 정보를 수신하고 연속적인 음성 신호 중 특정 발화자의 음성 신호만을 추출할 수 있다. 음성 신호 동기화 처리 장치(200)는 추출한 특정 발화자의 음성 신호를 재편집하여 연속적인 음성 신호로 생성할 수 있다. 예를 들어, 특정 발화자의 음성 신호만이 연속적으로 나오도록 특정 발화자에 의해 발생된 음성 신호 간의 인터벌을 제거할 수 있다. 음성 신호 동기화 처리 장치(200)는 특정 발화자를 기준으로 편집한 편집 음성 신호를 청취자에게 전송할 수 있다. The apparatus for synchronizing the speech signal 200 can receive information on the speech signal of a specific speaker and extract only the speech signal of a specific speaker from the continuous speech signal. The speech signal synchronization processing apparatus 200 can re-edit the extracted speech signal of the specific speaker to generate a continuous speech signal. For example, the interval between speech signals generated by a specific speaker can be removed so that only the speech signal of a specific speaker is continuously output. The speech signal synchronization processing apparatus 200 can transmit the edited speech signal edited on the basis of the specific speaker to the listener.

도 8은 도 2에 도시된 음성 신호 동기화 처리 장치를 나타낸 블록도이다. 8 is a block diagram showing the apparatus for synchronizing the voice signal shown in Fig.

도 8을 참조하면, 음성 신호 동기화 처리 장치는 동기화 신호 생성부(210), 멀티미디어 정보 처리부(220), 통신부(230) 및 프로세서(240)를 포함할 수 있다. 멀티미디어 정보 처리부(220)는 외부 잡음 제거부(220-1), 중첩 음성 신호 처리부(220-2), 회의 기록 생성부(220-3)를 포함할 수 있다. Referring to FIG. 8, the apparatus for synchronizing a voice signal may include a synchronization signal generator 210, a multimedia information processor 220, a communication unit 230, and a processor 240. The multimedia information processing unit 220 may include an external noise removing unit 220-1, an overlapping audio signal processing unit 220-2, and a conference record generating unit 220-3.

각각의 구성부는 도 1 내지 도 7에서 개시된 음성 신호 동기화 처리 장치의 동작을 수행할 수 있다. 예를 들어, 각 구성부는 아래와 같은 동작을 수행할 수 있다.Each component can perform operations of the apparatus for synchronizing voice signals disclosed in Figs. 1 to 7. For example, each component can perform the following operations.

동기화 신호 생성부(210)는 복수의 청취자 단말로 음성 신호를 동기화하여 전송하기 위한 동기화 신호를 생성하기 위해 구현될 수 있다.The synchronization signal generation unit 210 may be implemented to generate a synchronization signal for synchronizing and transmitting voice signals to a plurality of listener terminals.

멀티미디어 정보 처리부(220)는 발화자 단말에 의해 전송된 소리 신호를 후 처리하여 발화자의 음성 신호를 추출하고 적어도 하나의 청취자 단말로 추출된 음성 신호를 전송하기 위해 구현될 수 있다. 또한, 멀티미디어 정보 처리부(220)는 음성 정보를 텍스트화하고 연속적인 음성 정보를 복수개의 하위 음성 신호 단위로서 분할하고, 분할된 복수개의 하위 음성 신호 단위 각각에 대해 태그 정보를 할당할 수 있다.The multimedia information processing unit 220 may be implemented to extract a voice signal of the speaker by post-processing the voice signal transmitted by the speaker terminal and transmit the voice signal extracted to the at least one listener terminal. In addition, the multimedia information processing unit 220 may segment the voice information into a plurality of lower voice signal units, and may assign tag information to each of the plurality of divided lower voice signal units.

멀티미디어 정보 처리부(220)는 외부 잡음 제거부(220-1), 중첩 음성 신호 처리부(220-2), 회의 기록 생성부(220-3)를 포함할 수 있다.The multimedia information processing unit 220 may include an external noise removing unit 220-1, an overlapping audio signal processing unit 220-2, and a conference record generating unit 220-3.

외부 잡음 제거부(220-1)는 발화자 단말에 의해 전송된 소리 신호에서 외부 잡음을 제거하기 위해 구현될 수 있다. 외부 잡음 제거부(220-1)는 발화자의 음성 신호에 대한 분석을 기반으로 발화자의 음성 신호의 특성 정보(신호의 진폭, 신호의 주파수 등)을 추출할 수 있다. 외부 잡음 제거부(220-1)는 추출된 발화자의 음성 신호의 특성 정보를 기반으로 입력된 소리 신호 중 발화자의 음성 신호에 해당하는 신호만을 추출하고 그 외의 신호는 외부 잡음 신호로 인식하여 필터링하도록 구현될 수 있다.The external noise eliminator 220-1 may be implemented to remove external noise from the sound signal transmitted by the speaker terminal. The external noise eliminator 220-1 can extract characteristic information (amplitude of a signal, frequency of a signal, etc.) of a speech signal of a speaking person based on an analysis of the speech signal of the speaker. The external noise removing unit 220-1 extracts only the signal corresponding to the speech signal of the speaker among the input sound signals based on the characteristic information of the speech signal of the extracted speaker and recognizes the other signals as the external noise signal to filter Can be implemented.

중첩 음성 신호 처리부(220-2)는 제1 음성 신호와 제2 음성 신호의 중첩 대역이 최소가 되도록 제1 음성 신호와 제2 음성 신호의 시간 동기를 조정하기 위해 구현될 수 있다.The superposed audio signal processing unit 220-2 may be implemented to adjust time synchronization between the first audio signal and the second audio signal so that the overlapping band of the first audio signal and the second audio signal is minimized.

회의 기록 생성부(220-3)는 음성 신호 동기화 처리 장치는 음성 분석을 통해서 음성 신호를 텍스트화하기 위해 구현될 수 있다. 회의 기록 생성부(220-3)는 회의 주제에 따라 연속된 음성 신호를 하위 음성 신호 단위로서 분류하고 분류된 각각의 하위 음성 신호 단위에 대한 태그 정보를 부여할 수 있다. The conference record generation unit 220-3 can be implemented to textize a voice signal through voice analysis. The conference record generating unit 220-3 may classify consecutive voice signals as lower voice signal units according to the subject of the conference and assign tag information for each lower voice signal unit classified.

통신부(230)는 발화자 단말 및 청취자 단말과 음성 신호/소리 신호의 송신 또는 수신을 위해 구현될 수 있다.The communication unit 230 may be implemented for transmitting or receiving a voice signal / voice signal with the speaker terminal and the listener terminal.

프로세서(240)는 동기화 신호 생성부(210), 멀티미디어 정보 처리부(220)의 동작을 제어하기 위해 구현될 수 있다.The processor 240 may be implemented to control operations of the synchronization signal generation unit 210 and the multimedia information processing unit 220.

이와 같은, 온라인 상에서 회의를 수행하는 기술은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.Such techniques for conducting online meetings can be implemented in an application or implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the computer-readable recording medium may be ones that are specially designed and configured for the present invention and are known and available to those skilled in the art of computer software.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.

100: 동기화 서버
110: 발화자 단말
120: 청취자 단말
150: 미디어 서버
200: 음성 동기화 처리 장치
300: 발화자 단말
310: 제1 청취자 단말
320: 제2 청취자 단말
210: 동기화 신호 생성부
220: 멀티미디어 정보 처리부
220-1: 외부 잡음 제거부
220-2: 중첩 음성 정보 처리부
220-3: 회의 기록 생성부
230: 통신부
240: 프로세서100: Sync Server
110:
120: Listener terminal
150: Media server
200: voice synchronization processing device
300: Speaker terminal
310: first listener terminal
320: second listener terminal
210: Synchronization signal generation unit
220: Multimedia information processor
220-1: External noise elimination
220-2: superimposed audio information processor
220-3: conference record generating unit
230:
240: Processor

Claims

An on-line conference method based on synchronization of voice signals,
Wherein the voice signal synchronization processing unit receives a first synchronization signal including a sound signal from the speaker terminal and information about a point of time of the sound signal,
Wherein the sound signal synchronization processing apparatus includes a post-processing unit that removes an external noise signal from the sound signal and extracts a matching speech signal matching the speech characteristic information of the speaker from the sound signal as the speech signal, Extracts the speech signal of the speaker included in the speech signal,
Wherein the voice signal synchronization processing apparatus transmits the voice signal and a second synchronization signal determined based on the first synchronization signal to the listener terminal,
Wherein the second synchronization signal determines a playback time of the audio signal at the listener terminal,
The apparatus for synchronizing voice signals,
When a voice signal having the same voice characteristic information is received from each of a plurality of terminals, the terminal having received the voice signal having the largest amplitude among the plurality of terminals is determined as the speaking terminal, And the remaining terminals are determined to be the listener terminal.

delete

The method according to claim 1,
Removing the external noise signal from the sound signal,
And removing a signal from the sound signal that is not a human speech signal.

The method of claim 3,
Wherein the voice characteristic information includes:
And a frequency band, an amplitude, and a speech characteristic information of the speech of the speaker.

delete

The method according to claim 1,
The apparatus for synchronizing voice signals,
Analyzing a voice characteristic of the voice signal having the largest magnitude and matching the voice characteristic of the voice signal with the speaker terminal as voice characteristic information of the speaker.

The method according to claim 1,
The apparatus for synchronizing voice signals,
Wherein the time alignment unit adjusts time synchronization of each of the sound signal and the other sound signal when the sound signal and the other sound signal are received from the speaker terminal and another speaker terminal on the same time resource,
Wherein the second synchronization signal comprises an adjusted time synchronization for the sound signal.

8. The method of claim 7,
The apparatus for synchronizing voice signals,
Adjusting a time synchronization of the sound signal to a previous time in consideration of a previous sound signal of the sound signal when a start point of the sound signal is earlier than a start point of the other sound signal,
And adjusting the time synchronization of the sound signal to a later time in consideration of a sound signal of the sound signal if the starting point of the sound signal is slower than the starting point of the other sound signal.

The method according to claim 1,
The apparatus for synchronizing voice signals,
The speech signal is converted into text information, the speech signal is divided into a plurality of lower speech signal units based on the text information, and tag information is assigned to each of the plurality of lower speech signal units based on the text information Online meeting method.

A computer-readable recording medium on which is recorded a computer program for providing an online conference service according to any one of claims 1, 3, 4, and 6 to 9.

An apparatus for synchronizing voice signals for online meetings, the apparatus comprising: a processor;
The processor receives a first synchronization signal including a sound signal from the speaker terminal and information on a point of time of the sound signal,
Processing for extracting a matched speech signal matching the speech characteristic information of the speaker from the speech signal as the speech signal based on the speech signal of the speech signal, Respectively,
And a second synchronization signal determined based on the voice signal and the first synchronization signal to the listener terminal,
Wherein the second synchronization signal determines a playback time of the audio signal at the listener terminal,
When a voice signal having the same voice characteristic information is received from each of a plurality of terminals, the terminal having received the voice signal having the largest amplitude among the plurality of terminals is determined as the speaking terminal, And the remaining terminals are determined to be the listener terminal.

delete

12. The method of claim 11,
Removing the external noise signal from the sound signal comprises:
Wherein the signal processing unit removes a signal other than a human voice signal from the sound signal.

14. The method of claim 13,
Wherein the voice characteristic information includes:
And a frequency band, an amplitude, and an utterance characteristic information of the speech of the utterance.

delete

12. The method of claim 11,
The processor comprising:
And analyzing the voice characteristic of the voice signal having the largest magnitude and storing the voice characteristic of the speaker as the voice characteristic information of the speaker.

12. The method of claim 11,
The processor comprising:
And adjusts the time synchronization of the sound signal and the other sound signals when the sound signal and the other sound signals are received from the talker terminal and the other talker terminals on the same time resource,
Wherein the second synchronization signal comprises an adjusted time synchronization for the sound signal.

18. The method of claim 17,
The processor comprising:
Adjusting a time synchronization of the sound signal to a previous time point in consideration of a previous sound signal of the sound signal when the start point of the sound signal is earlier than the start point of the other sound signal,
And adjusts the time synchronization of the sound signal to a later time point in consideration of a subsequent sound signal of the sound signal when the start point of the sound signal is slower than the start point of the other sound signal.

12. The method of claim 11,
The processor comprising:
Converting the speech signal into text information,
Dividing the speech signal into a plurality of lower speech signal units based on the text information,
And assign tag information to each of the plurality of lower speech signal units based on the text information.

20. The method of claim 19,
Wherein the tag information is determined based on a frequency of use of words included in the text information.