KR100377571B1

KR100377571B1 - Apparatus for processing voice chatting data

Info

Publication number: KR100377571B1
Application number: KR10-2000-0063487A
Authority: KR
Inventors: 박강훈
Original assignee: 주식회사 솔피정보통신
Priority date: 2000-10-27
Filing date: 2000-10-27
Publication date: 2003-03-26
Also published as: KR20020032799A

Abstract

음성 채팅에 참여한 다수 사용자의 음성 데이터를 합성한 후 단일 대역폭을 사용하여 송/수신함으로써 네트워크상에서의 음성 데이터 트래픽을 현저히 감소시켜 음성 데이터 처리 효율의 증대를 꾀할 수 있는 다자간 음성 채팅 데이터 처리 장치가 개시된다.Disclosed is a multi-party voice chat data processing device capable of increasing voice data processing efficiency by significantly reducing voice data traffic on a network by synthesizing voice data of multiple users who participated in a voice chat and then using a single bandwidth. do.

본 발명에 의하면 여러 사용자들로부터 음성 데이터를 받아 단일 합성 음성 데이터로 합성하며 각 채팅 참가자에게 전송되는 최종 합성 데이터는 단일 대역폭의 단일 채널이 되므로 채팅 참가자 단말기의 데이터 처리 부하를 현저히 감소 시킬 수 있다. 또한 모든 채팅 참가자는 단일 대역폭만으로 음성 데이터를 처리할 수 있기 때문에 채팅 참가자수를 제한할 필요가 없다.According to the present invention, voice data from multiple users is synthesized into single synthesized voice data, and the final synthesized data transmitted to each chat participant becomes a single channel with a single bandwidth, thereby significantly reducing the data processing load of the chat participant terminal. In addition, all chat participants can process voice data with only a single bandwidth, so there is no need to limit the number of chat participants.

Description

Multi-party voice chat data processing device {Apparatus for processing voice chatting data}

본 발명은 다자간 음성 채팅 데이터의 처리 장치에 관한 것으로 보다 상세하게는 음성 채팅에 참여한 다수 사용자의 음성 데이터를 합성한 후 단일 대역폭을 사용하여 송/수신함으로써 네트워크상에서의 음성 데이터 트래픽을 현저히 감소시켜 음성 데이터 처리 효율의 증대를 꾀할 수 있는 다자간 음성 채팅 데이터의 처리 장치에 관한 것이다.The present invention relates to an apparatus for processing multi-party voice chat data, and more particularly, by synthesizing voice data of a plurality of users who participated in a voice chat and transmitting / receiving using a single bandwidth to significantly reduce voice data traffic on a network. The present invention relates to a multi-party voice chat data processing device capable of increasing data processing efficiency.

인터넷을 포함한 디지털 네트워크를 통해 음성을 송수신할 때 아날로그 신호인 음성을 직접 전송할 수 없으므로 이를 디지털 데이터화한 디지털 음성 데이터로 변환하여 전송하게 된다. 이 때 디지털 데이터의 전송에 필요한 대역폭은 아날로그 신호를 디지털 데이터화하는데 필수적인 샘플링의 레이트(rate)를 어느 정도로 하느냐에 따라 결정되는데 통상 64kbps의 대역폭을 필요로 하게 되며 이보다 높은 음질이 요구되는 경우에는 샘플링 레이트가 높아져야 하고 이는 전송되는 디지털 음성 데이터의 증가를 의미하므로 더욱 높은 네트워크 대역폭을 요구하게 된다. 따라서 인터넷처럼 비 고정적 속도를 제공하는 네트워크상에서 데이터의 안정적인 송수신 대역폭을 확보하기 위하여 여러가지 형태의 디지털 압축 방식이 채택되고 있으며 그 방식을 규정한 몇 개 국제 표준은 32kbps ADPCM, G.729, CELP 3.2a LPC-10, G711/721/723, G.728 LD-CELP, GSM 6.10, MPEG-1 Layer3 등을 그 예로 들 수 있다.When transmitting and receiving voice over a digital network including the Internet, the voice, which is an analog signal, cannot be directly transmitted, so it is converted into digital voice data converted into digital data and transmitted. In this case, the bandwidth required for the transmission of digital data is determined by how much the sampling rate necessary for converting the analog signal into digital data is required. In general, a bandwidth of 64 kbps is required. It must be high, which means an increase in the digital voice data being transmitted, thus requiring higher network bandwidth. Therefore, various types of digital compression methods are adopted to secure stable transmission / reception bandwidth of data in a network that provides a fixed speed like the Internet, and some international standards that define the methods are 32kbps ADPCM, G.729, CELP 3.2a. Examples include LPC-10, G711 / 721/723, G.728 LD-CELP, GSM 6.10, and MPEG-1 Layer3.

도 1에서 도 3은 기존의 음성 채팅 구성 방식을 나타낸 것이다.1 to 3 show a conventional voice chat configuration method.

도 1은 1:1 음성 채팅 구성으로서 채팅의 가장 기본적인 형태이며 대표적인 예로는 인터넷 전화를 들 수 있다.1 is a 1: 1 voice chat configuration, which is the most basic form of chat, and a representative example is an internet telephone.

본 구성은 가장 기본적이고 일반적인 형태의 음성 신호의 송수신 방법을 이용하며 디지털 음성의 압축 방식과 비고정 통신 효율을 제공하는 인터넷에서의 고정된 대역폭을 확보하여 안정적인 음성 재생을 위한 큐잉(queueing) 방법 등이 사용되며 1:1 방식의 특성상 음성 데이터 트래픽의 증가로 인한 처리상의 문제는 발생하지 않는다.This configuration uses the most basic and general type of voice signal transmission and reception, and the queuing method for stable voice playback by securing a fixed bandwidth in the Internet that provides digital voice compression and non-fixed communication efficiency. This is used, and due to the nature of the 1: 1 method, processing problems due to an increase in voice data traffic do not occur.

도 2와 도 3은 3인 이상이 동시에 참여하는 음성 채팅 구성 방식으로서 도 2는 채팅서버 디렉토리 방식이라 하며 도 3은 채팅서버 중계 방식이라 칭한다.2 and 3 is a voice chat configuration method in which three or more people simultaneously participate, FIG. 2 is called a chat server directory method, and FIG. 3 is called a chat server relay method.

도 2에 제시된 채팅서버 디렉토리 방식에 있어서, 채팅 서버는 단순히 디렉토리 기능만을 제공하여 각 사용자들에게 채팅에 참여하고 있는 사용자의 인터넷상의 주소만을 알려주고(도 2의 점선 화살표 부분) 음성 채팅의 실제 데이터 흐름은 각 사용자간에 이루어져 분배 및 합성 처리한다(즉, 서버는 사용자들을 네트워크상에서 서로 연결시켜 주는 역할만을 하고 데이터의 분배 및 합성에 관여하지 않는다). 이 경우 각 사용자 단말기는 채팅에 참여한 다른 모든 사람에게 자신의 음성 데이터를 전부 송신/합성하여야 하며 또한 각 사용자로부터 수신되는 모든 음성 데이터를 수신/합성하여야 한다. 따라서 채팅에 참가한 사용자 숫자가 늘어남에 따라 각 사용자에게 필요한 네트워크상의 송신 대역폭 및 수신 대역폭은 구성원의 수에 비례하여 증가하게 된다. 도 2의 실선 화살표 하나는 각 사용자에게 할당된 대역폭(12kbps)을 나타낸다. 도 2에 제시된 바와 같이 채팅 참여자가 4인인 경우 각 사용자에게 필요한 대역폭은 1:1 방식의 12kbps의 4배인 48kbps가 송수신 모두에 필요하게 된다. 일반적인 사용자 단말기(대표적으로 PC에 내장된 통신 카드 및음성 처리 카드)는 대개의 경우 데이터 처리에 필요한 대역폭을 유한하게 갖고 있으므로 동시 채팅 참가자수는 매우 제한적일 수 밖에 없다. 예를 들어 음성 데이터 전송이 12kbps의 대역폭을 가질 경우 56kbps PSTN 망을 이용하여 이론상으로 4명 정도만 동시 채팅이 가능하다.In the chat server directory method shown in FIG. 2, the chat server merely provides a directory function to inform each user of only the address on the Internet of the user participating in the chat (dashed arrow portion of FIG. 2), and the actual data flow of the voice chat. The distribution is composed between each user and distributed and synthesized (ie, the server only connects users on a network and does not participate in the distribution and synthesis of data). In this case, each user terminal should transmit / synthesize all of its own voice data to everyone else participating in the chat, and also receive / synthesize all voice data received from each user. Therefore, as the number of users participating in the chat increases, the transmission bandwidth and the reception bandwidth required for each user increase in proportion to the number of members. One solid arrow in FIG. 2 represents the bandwidth (12kbps) allocated to each user. As shown in FIG. 2, when four chat participants are present, the bandwidth required for each user is 48kbps, which is four times 12kbps in a 1: 1 manner, for both transmission and reception. In general, a user terminal (typically a communication card and a voice processing card embedded in a PC) usually has a finite bandwidth for data processing, so the number of simultaneous chat participants is very limited. For example, if a voice data transmission has a bandwidth of 12 kbps, only four people can simultaneously chat using a 56 kbps PSTN network.

도 3에 제시된 채팅서버 중계 방식은 채팅서버가 채팅에 참여한 모든 사용자로부터 음성 데이터를 전송받아 모든 사용자에게 음성 데이터를 중계 전송하는 방식이다. 이 경우 모든 사용자 단말기는 사용자 자신의 음성 데이터를 채팅 서버로 전송함으로써 음성 데이터 송신의 임무를 다한다(즉, 송신은 단일 대역폭만으로 가능하다). 그러나 채팅 서버에 의해 중계된 음성 데이터는 모든 다른 사용자에게 중복 전송되므로 각 사용자 단말기는 다른 모든 사람들이 보내온 음성 데이터를 수신/합성하여 원음성으로 재생하여야 한다. 이 경우 송신 대역폭은 각 사용자에게 모두 일정한 1채널의 크기(12kbps)를 필요로 하지만 수신 대역폭의 경우 도 2의 경우와 마찬가지로 채팅에 참가한 사용자 수가 늘어남에 따라 각 사용자에게 필요한 인터넷 상의 수신 대역폭은 구성원의 수에 비례하여 증가하게 된다. 더구나 네트워크 전체의 데이터 트래픽(traffic)은 채팅 사용자가 늘어남에 따라 기하급수적으로 증가하므로 엄청난 양의 네트워크 처리속도를 필요로 하게 된다.The chat server relay method shown in FIG. 3 is a method in which the chat server receives voice data from all users participating in a chat and relays voice data to all users. In this case, all user terminals fulfill the task of transmitting voice data by transmitting their own voice data to the chat server (i.e., transmission is possible only with a single bandwidth). However, since the voice data relayed by the chat server is repeatedly transmitted to all other users, each user terminal needs to receive / synthesize the voice data sent by everyone else and reproduce the original voice. In this case, the transmission bandwidth requires a constant size of one channel (12kbps) for each user. However, as for the reception bandwidth, as the number of users participating in the chat increases, the reception bandwidth on the Internet required for each user is increased. It will increase in proportion to the number. In addition, data traffic throughout the network grows exponentially as the number of chat users grows, requiring a tremendous amount of network throughput.

따라서 기존의 다자간 실시간 음성 채팅의 경우 동시 참여자의 수가 매우 제한적일 수 밖에 없다는 문제점이 있다. 또한 기존의 다자간의 채팅에 대해서는 고속전용회선의 경우라도 사용자 수에 비례하여 기하급수적인 네트워크 트래픽의 증가로 동시 참여자의 약간의 증가로 인해 수십명 내외의 동시 채팅이 불가능한 상황이 발생하게 된다.Therefore, the existing multi-party real-time voice chat has a problem that the number of concurrent participants is very limited. In addition, in the case of the existing multi-party chat, even in the case of high-speed leased lines, an increase in exponential network traffic in proportion to the number of users causes a situation in which simultaneous chats of several dozen or more people are impossible due to a slight increase in simultaneous participants.

따라서 본 발명은 이와 같은 문제점을 해결하기 위해 창안된 것으로서 본 발명의 목적은 음성 채팅에 참여한 다수 사용자의 음성 데이터를 합성하여 단일 대역폭을 사용하여 중계함으로써 네트워크상에서의 음성 데이터 트래픽을 현저히 감소시켜 음성 데이터 처리 효율의 증대를 꾀할 수 있는 다자간 음성 채팅 데이터의 처리 장치를 제공함에 있다.Therefore, the present invention was devised to solve such a problem, and an object of the present invention is to synthesize voice data of a plurality of users who participated in a voice chat and to relay using a single bandwidth, thereby significantly reducing voice data traffic on the network. It is an object of the present invention to provide a multi-party voice chat data processing apparatus capable of increasing processing efficiency.

도 1은 1:1 채팅 방식의 구성도.1 is a block diagram of a 1: 1 chat method.

도 2는 기존의 음성 채팅 서버를 통한 채팅 방식 중 채팅서버 디렉토리 방식의 구성도.Figure 2 is a configuration of the chat server directory method of the chat method through the existing voice chat server.

도 3은 기존의 음성 채팅 서버를 통한 채팅 방식 중 채팅서버 중계 방식의 구성도.3 is a configuration of the chat server relay method of the chat method through the existing voice chat server.

도 4는 본 발명의 장치 구성도 및 네트워크 구성도4 is a device configuration diagram and a network configuration diagram of the present invention.

도 5는 본 발명이 구현되는 개념적 구성도.5 is a conceptual configuration in which the present invention is implemented.

이와 같은 목적을 달성하기 위해 본 발명은, 음성 채팅에 참여하는 복수의 사용자들과, 상기 각 사용자들로부터 전송되어온 음성 데이터를 합성함으로써 단일 대역폭의 합성 음성 데이터를 생성하여 상기 각 사용자에게 전송하는 채팅 서버를 포함하는 다자간 음성 채팅 데이터 처리 장치에 있어서: 상기 채팅서버는, 네트워크를 통해 비동기적으로 유입되는 사용자별 음성 데이터들 각각에 비동기의 보정을 위하여, 상기 유입되는 데이터량이 상기 장치의 수신 동기신호 간격 동안 정상적으로 유입되는 데이터량보다 많은 음성 데이터에 대해서는 초과 인자값을 부여하고, 상기 유입되는 데이터량이 상기 장치의 수신 동기신호 간격 동안 정상적으로 유입되는 데이터량보다 적은 음성 데이터에 대해서는 지연 인자값을 부여하는 방법으로 수정 인자값을 부여하여 동기 보정한 후 음성 수신 큐에 큐잉(queueing)하는 음성 데이터 보정/큐잉 모듈, 큐잉된 각 사용자별 음성 데이터를 신장하는 신장 모듈, 신장된 각 사용자별 음성 데이터를 합성하여 합성 음성 데이터를 생성하는 음성 데이터 합성 모듈, 합성 음성 데이터로부터 자가 음성 데이터를 소거하여 최종 합성 데이터를 추출하는 최종 합성 데이터 추출 모듈, 최종 합성 데이터를 압축하는 압축 모듈, 수정 인자값을 반영한 보정된 송신 동기신호 간격을 산출하는 동기화 모듈을 포함함을 그 특징으로 한다.In order to achieve the above object, the present invention, by combining a plurality of users participating in the voice chat and the voice data transmitted from each of the users to generate a single-band synthesized voice data to send the chat to each user In the multi-party voice chat data processing device comprising a server: The chat server, the incoming data amount of the received synchronization signal of the device for asynchronous correction to each of the user-specific voice data flowing asynchronously through the network The excess factor value is given to voice data that is larger than the data amount normally flowing during the interval, and the delay factor value is given to voice data that is smaller than the data amount normally flowing during the receiving synchronization signal interval of the device. Modification factor value by the method Voice data correction / queuing module for queuing and then synchronizing to a voice reception queue, a decompression module for decompressing the queued voice data for each user, and generating synthesized voice data by synthesizing the decompressed voice data for each user. A final data synthesis module extracting the final synthesis data by erasing the self speech data from the synthesized speech data, a compression module compressing the final synthesis data, and a corrected transmission synchronization signal interval reflecting the correction factor value It characterized in that it comprises a synchronization module.

이하 본 발명의 바람직한 실시양태를 첨부도면에 의거하여 자세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 4는 본 발명의 장치 구성도 및 네트워크 구성도를 나타낸 것이다.Figure 4 shows the device configuration and network configuration of the present invention.

사용자 단말기(100)에는 음성 데이터 처리 부분에 관한 구성만 제시되어 있으며 제시된 구성 요소는 이미 공지된 것이므로 본 명세서에서는 각 기능을 간략히 설명한다.Since only the configuration of the voice data processing part is presented in the user terminal 100 and the presented components are known in the art, each function will be briefly described.

음성 디지털화 처리부(101)는 사용자의 아날로그 음성을 디지털화하는 부분이다. 디지털화된 데이터는 PCM(Pulse Code Modulation) 코드 형태로 되어 있다.The voice digitization processing unit 101 is a part for digitizing the analog voice of the user. Digitized data is in the form of PCM (Pulse Code Modulation) codes.

음성 데이터 압축부(102)는 상기한 PCM 데이터를 압축하는 부분으로서 압축의 방식은 위에서 언급한 여러 표준 중의 하나이다.The voice data compressor 102 compresses the PCM data described above, and the compression method is one of several standards mentioned above.

외부 통신부(105)는 압축 데이터를 송신 처리하거나 채팅 서버(200)로부터의 음성 데이터를 수신 처리하는 부분으로서 보통 PC에 내장된 통신 카드가 그 예가 될 수 있다.The external communication unit 105 is a portion for transmitting and processing compressed data or receiving and receiving voice data from the chat server 200, for example, a communication card usually built in a PC.

음성 데이터 신장부(104)는 채팅 서버(200)로부터 수신한 음성 데이터를 복원시키는 부분이다.The voice data decompressor 104 restores the voice data received from the chat server 200.

음성 아날로그화 처리부(103)는 복원된 수신 음성 데이터를 아날로그화하는 부분으로서 사용자에게 직접 들리는 음성을 만들어낸다.The voice analog processing unit 103 generates a voice directly heard by the user as a part of analogizing the received received voice data.

음성 데이터 보정/큐잉 모듈(201)은 네트워크(300)를 통해 비동기적으로 유입된 사용자별 음성 데이터에 수정 인자값을 부여하여 동기 보정한 후 음성 수신큐(202)에 큐잉(queueing)하는 부분으로서 네트워크(300) 자체내에서 발생하는 데이터 전송의 지연 현상과 각 사용자 단말기 성능차에 기인한 사용자별 음성 데이터의 비동기를 보정하여 음성 수신 큐(202)에 큐잉한다.The voice data correction / queuing module 201 is a part for queuing the voice reception queue 202 after synchronously correcting by giving a correction factor value to voice data for each user asynchronously introduced through the network 300. Queuing is performed in the voice reception queue 202 by correcting the asynchronousness of user-specific voice data due to the delay of data transmission occurring within the network 300 itself and the performance difference of each user terminal.

통상 서버에 비동기적으로 유입되는 데이터의 처리에 있어서 위에서 언급한 큐잉 기법이 많이 활용되는데 이는 네트워크의 지연 또는 사용자 단말기의 데이터 발생 동기신호의 미세한 차이에 의하여 모든 사용자로부터 동일한 데이터량의 유입이 보장될 수 없는 관계로 서버에 사용자별로 할당된 채널마다 일정 크기의 수신 큐를 두어 일정 시간 동안 데이터를 보관함으로써 네트워크에서 발생하는 지연으로 인한 데이터 유입의 비동기성에 대한 완충 작용을 하여 데이터의 동기적 처리를 할 수 있는 기법이다. 그러나, 각 사용자간의 데이터 발생 동기 신호의 차이는 시간이 지남에 따라 누적되므로 단순히 수신 큐 확보 만으로는 일정간격으로 발생하는 데이터의 부족 또는 초과 현상을 완전히 제거할 수 없다.In general, the above-mentioned queuing technique is widely used in the processing of data flowing asynchronously into the server, which may guarantee the inflow of the same amount of data from all users due to a delay in the network or a slight difference in the data generation synchronization signal of the user terminal. As a result, the server keeps data for a certain amount of time by placing a receiving queue of a certain size for each channel allocated to each user in the server to buffer the data asynchronously due to delays in the network. It can be a technique. However, since the difference in the data generation synchronization signal between each user accumulates over time, the shortage or excess of data occurring at a predetermined interval cannot be completely eliminated by simply securing a reception queue.

그래서 본 발명에서는 이와 같은 큐잉 기법이외에 데이터의 부족 또는 초과 현상을 해결하기 위해 각 사용자별 음성 데이터에 수정 인자값(지연 인자값 또는 초과 인자값)을 부여하여 서버에 유입되는 음성 데이터의 비동기를 해결하고자 한다.Therefore, in the present invention, in addition to such a queuing technique, in order to solve a lack or excess of data, a correction factor value (a delay factor value or an excess factor value) is added to each user's voice data to solve the asynchronousness of the voice data flowing into the server. I would like to.

우선 각 사용자 채널별로 음성 수신 큐(202)를 일정 크기로 할당한다. 일정 크기는 서버(200)의 음성 데이터 수신 동기신호의 시간 간격에 서버(200)에 정상적으로 유입되어야 할 음성 데이터량이 되며 이 간격 단위로 서버(200)는 각 사용자의 음성 데이터를 음성 수신 큐(202)에 기록하게 된다.First, the voice reception queue 202 is allocated to a predetermined size for each user channel. The predetermined size is the amount of voice data that should normally flow into the server 200 at the time interval of the voice data reception synchronization signal of the server 200. In this interval unit, the server 200 receives the voice data of each user from the voice reception queue 202. ).

음성 데이터 보정/큐잉 모듈(201)은 각 사용자의 음성 데이터를 음성 수신 큐(202)에 기록하면서(큐잉하면서) 각 사용자별 음성 데이터에 상기 두 인자값 중 어느 하나를 설정하는데 서버(200)의 음성 데이터 수신 동기신호 간격에 비하여 빠르게 수신되는 데이터(데이터량이 수신 동기신호 간격 동안 정상적으로 유입되는 데이터량보다 많은 데이터)에 대해서는 초과 인자값을 부여하고 현재 해당 음성 데이터가 기록될 음성 수신 큐(202)의 가장 오래된 데이터를 삭제하고 신규로 유입된 음성 데이터를 기록한다.The voice data correction / queuing module 201 records (queues) the voice data of each user in the voice reception queue 202 and sets one of the two parameter values in the voice data for each user. The voice reception queue 202 is provided with an excess factor value for data received faster than the voice data reception synchronization signal interval (data amount larger than the data amount normally introduced during the reception synchronization signal interval) and the corresponding voice data is recorded. Delete the oldest data from and record the newly imported voice data.

이 때 초과 인자값은 시간값으로 부여되는데 예를 들어 서버(200)의 수신 동기신호 시간 간격이 10이라 하고 빠르게 수신되는 데이터의 동기신호 시간 간격은 9라 하면 초과 인자값은 1이 된다.At this time, the excess factor value is given as a time value. For example, if the reception synchronization signal time interval of the server 200 is 10 and the synchronization signal time interval of rapidly received data is 9, the excess factor value is 1.

음성 데이터 보정/큐잉 모듈(201)은 또한 서버(200)의 음성 데이터 수신 동기신호 간격에 비해 느리게 수신되는 데이터(데이터량이 수신 동기신호 간격 동안 정상적으로 유입되는 데이터량보다 적은 데이터)에 대해서는 지연 인자값을 부여한다.The voice data correction / queuing module 201 also provides a delay factor value for data received slower than the voice data reception synchronization signal interval of the server 200 (data amount less than the data amount normally introduced during the reception synchronization signal interval). To give.

이 때 지연 인자값은 초과 인자값에서와 같이 시간값으로 부여되는데 예를 들어 서버(200)의 수신 동기신호 시간 간격이 10이라하고 느리게 수신되는 데이터의 동기신호 시간 간격은 12라 하면 지연 인자값은 -2가 된다.At this time, the delay factor value is given as a time value as in the excess factor value. For example, if the reception synchronization signal time interval of the server 200 is 10 and the synchronization signal time interval of slowly received data is 12, the delay factor value Becomes -2.

음성 데이터 보정/큐잉 모듈(201)은 각 사용자의 음성 데이터에 초과 인자값 또는 지연 인자값을 부여한 후 이를 음성 수신 큐(202)에 기록한다(큐잉한다).The voice data correction / queuing module 201 assigns an excess factor value or a delay factor value to the voice data of each user and records (queues) it in the voice reception queue 202.

신장 모듈(203)은 압축된 각 사용자별 음성 데이터를 음성 수신 큐(202)로부터 독출하여 이를 신장(伸帳)하는 부분이며 신장된 데이터는 사용자 단말기(100)의 음성 디지털화 처리부(101)를 거친 코드 즉, PCM 코드 형태를 갖춘 사용자의 원음성 데이터이며 원음성 큐(204)에 기록된다.The decompression module 203 reads the compressed voice data of each user from the voice reception queue 202 and decompresses the decompressed data. The decompressed data passes through the voice digitization processing unit 101 of the user terminal 100. A code, i.e., original audio data of a user having a PCM code form, is recorded in the original speech queue 204.

음성 데이터 합성 모듈(205)은 신장된 각 음성 데이터를 합성하여 합성 음성 데이터를 생성하여 전체 합성음 큐(206)에 큐잉하는 부분으로서 합성은 각 원음성 데이터의 PCM 코드를 서로 합함으로써 이루어지는데 음성 데이터 합성 모듈(205)은 우선 원음성 큐(204)로부터 원음성 데이터를 일정 시간 간격으로 독출하며 각 원음성 데이터의 PCM 코드가 정수형 16비트 형식이라 하면 각 원음성 데이터를 합한 합성 음성 데이터도 정수형 16비트로 함으로써 단일 대역폭을 이용하여 각 사용자에게 전송할 수 있다. 만일 합성 결과 16비트가 넘어가는 경우에는 정수형 16비트가 가질 수 있는 최대값을 기준으로 절삭(clipping)하여 정수형 16비트 합성 음성 데이터를 만들어내는데 음성의 경우에는 합성 이후의 개별 데이터의 판별에 있어서 다른 형태의 데이터(화상, 텍스트 등)와는 달리 합성 이후에도 개별 음성을 인지해 내는데 어려움이 없다라고 알려져 있는 관계로 이러한 합성이 가능한 것이다.The voice data synthesizing module 205 synthesizes each expanded voice data to generate synthesized voice data and queues the synthesized voice cues 206. The synthesis is performed by adding PCM codes of the original voice data to each other. The synthesizing module 205 first reads the original audio data from the original audio queue 204 at predetermined time intervals. If the PCM code of each original audio data is an integer 16-bit format, the synthesized speech data obtained by adding the respective original audio data is also integer 16. Bits allow transmission to each user using a single bandwidth. If 16 bits are exceeded as a result of synthesis, clipping is performed based on the maximum value of integer 16 bits to generate integer 16 bit synthesized speech data. In case of speech, it is different in discriminating individual data after synthesis. Unlike form data (images, texts, etc.), this synthesis is possible because it is known that there is no difficulty in recognizing individual voices even after synthesis.

최종 합성 데이터 추출 모듈(207)은 합성 음성 데이터로부터 자가 음성 데이터를 소거한 후 각 사용자에게 전송될 최종 합성 데이터를 추출하여 최종 합성음큐(208)에 큐잉하는 부분으로서 전체 합성음 큐(206)로부터 합성 음성 데이터의 PCM 코드를 독출해오며 원음성 큐(204)로부터 사용자별 음성 데이터의 PCM 코드를 독출하여 두 PCM 코드의 차로써 최종 합성 데이터를 추출하여 최종 합성음 큐(208)에 기록한다. 자가 음성 데이터의 소거없이 합성 음성 데이터를 각 사용자에게 전송하게 되면 이 데이터에는 임의의 사용자 자신의 음성이 포함되어 있기 때문에 마치 메아리처럼 자신의 음성을 듣게되는 하울링 현상이 발생하며 이를 방지해 주기 위해 자가 음성 데이터를 소거하게 되는 것이다.The final synthesized data extraction module 207 erases the self-voice data from the synthesized speech data and then extracts the final synthesized data to be transmitted to each user and then synthesizes it from the entire synthesized speech queue 206 as a part of queuing to the final synthesized speech queue 208. The PCM code of the voice data is read, and the PCM code of the user-specific voice data is read from the original voice queue 204, and the final synthesized data is extracted as the difference between the two PCM codes, and recorded in the final synthesized sound queue 208. If the synthesized voice data is transmitted to each user without erasing the self voice data, since the data includes any user's own voice, a howling phenomenon occurs in which the user hears his voice like an echo. The voice data will be erased.

사용자의 음성 데이터는 서버(200)의 음성 데이터 수신 동기신호 간격보다 느리게 수신되거나 빠르게 수신될 수 있으며, 이런 상황을 보정하기 위해 위에서 언급한 바와 같이 음성 데이터 보정/큐잉 모듈(201)에 의해 지연 인자값 또는 초과 인자값을 부여하였다. 동기화 모듈(209)은 이런 요소들을 반영하여 지연 인자값과 초과 인자값을 반영한 보정된 송신 동기신호 간격을 산출하며 최종 합성음 큐(208)로부터 보정된 송신 동기신호 간격으로 사용자별 합성 데이터를 독출하여 압축 모듈(210)에 전송한다. 보정된 송신 동기신호 간격은 구체적으로는 아래에 제시된 바에 의해 산출된다.The user's voice data may be received slower or faster than the voice data reception synchronization signal interval of the server 200, and the delay factor may be delayed by the voice data correction / queuing module 201 as mentioned above to correct this situation. Value or excess factor value. The synchronization module 209 reflects these factors, calculates a corrected transmission synchronization signal interval reflecting the delay factor value and the excess factor value, and reads out user-specific composite data from the final synthesis tone queue 208 at the corrected transmission synchronization signal interval. Send to compression module 210. The corrected transmission synchronization signal interval is specifically calculated as shown below.

보정된 송신 동기신호 간격=서버 자체의 송신 동기신호 간격+(지연인자값 평균+초과인자값 평균)*보정인자값Corrected transmission synchronization signal interval = Transmission synchronization signal interval of the server itself + (average of delay factor value + average of overfactor value) * correction factor value

여기서 지연인자값 평균은 지연인자값을 부여 받은 여러 음성 데이터의 지연인자값의 평균치이며 초과인자값 평균은 초과인자값을 부여 받은 여러 음성 데이터의 초과인자값의 평균치이다. 보정인자값은 상기 두 인자값의 편차(deviation)를 시간으로 환산한 값으로 실험적으로 추출되며 보정인자값이 너무 크면 동기 신호의 변화가 매우 심해지므로 적당한 값을 설정하여야 한다. 특히, 보정 송신 동기신호 간격의 최대값과 최소값을 미리 정해놓음으로써 보정인자값이 매우 클 경우 발생할 수 있는 합성음의 왜곡 현상을 방지할 수 있다.Here, the average value of the delay factor is the average value of the delay factor values of the voice data given the delay factor value, and the average value of the excess factor is the average value of the excess factor values of the voice data given the excess factor value. The correction factor value is experimentally extracted as a time conversion of the deviation between the two factor values. If the correction factor value is too large, the change of the synchronization signal becomes very severe, and an appropriate value should be set. In particular, by setting the maximum value and the minimum value of the correction transmission synchronization signal interval in advance, it is possible to prevent distortion of the synthesized sound that may occur when the correction factor value is very large.

예를 들어 서버 자체의 송신 동기신호 간격이 10이며 지연인자값 평균이 -2, 초과 인자값 평균이 1, 보정 인자값이 0.2이라 하면 보정된 송신 동기신호 간격은 윗 식에 의하여 9.8이 된다. 만일 동기신호 간격의 최소값을 9.9이라 가정하면 산출된 동기신호 간격은 9.9로 보정된다.For example, if the transmission synchronization signal interval of the server itself is 10, the delay factor value average is -2, the excess factor value average is 1, and the correction factor value is 0.2, then the corrected transmission synchronization signal interval is 9.8 according to the above equation. If the minimum value of the synchronization signal interval is assumed to be 9.9, the calculated synchronization signal interval is corrected to 9.9.

압축 모듈(210)에 전송된 최종 합성 데이터는 소정의 압축 방식을 거쳐 송신 모듈(211)에 의해 각 사용자에게 전송되고 각 사용자 단말기의 음성 데이터 신장부(104)에 의해 신장되며 신장된 데이터는 음성 아날로그화 처리부(103)에 의해 아날로그 음성으로 변환되어 음성이 재생된다.The final synthesized data transmitted to the compression module 210 is transmitted to each user by the transmission module 211 through a predetermined compression scheme, expanded by the voice data extension unit 104 of each user terminal, and the expanded data is voiced. The analogization processing section 103 converts the analog speech into speech to reproduce the speech.

본 발명을 요약하면 본 발명의 다자간 음성 채팅 데이터 처리 장치는 기본적으로 도 3의 구성 형태 즉, 채팅서버 중계 방식으로 구현되나, 서버 내부적으로 각 사용자로부터 전송되어온 음성 데이터를 실시간 합성함으로써 단일 대역폭의 합성음성 채널을 생성하는데 이 점이 기존의 채팅 서버와 다른 점이다. 즉, 기존의 채팅 서버는 단순히 채팅 참가자들로부터 음성 데이터를 받아 별도의 조작없이 중계시켜주는 역할만 담당하였으나, 본 발명의 채팅 서버는 여러 사용자들로부터 음성 데이터를 받아 단일 합성 음성 데이터로 합성하며 각 채팅 참가자에게 전송되는 최종 합성 데이터는 단일 대역폭의 단일 채널이 되므로 모든 채팅 참가자는 1채널의 대역폭만으로 대화 내용 음성 데이터를 전달 받을 수 있다. 그러므로 거의 무한대의 채팅 참가자 참여하는 채팅의 경우에서도 각 채팅 참가자는 오직 단일 채널 크기의 대역폭 만으로도 채팅을 즐길 수 있다.In summary, the multi-party voice chat data processing apparatus of the present invention is basically implemented in the configuration of FIG. 3, that is, a chat server relay method, but synthesizes a single bandwidth by real-time synthesizing voice data transmitted from each user inside the server. This creates a voice channel, which is different from the existing chat server. That is, the existing chat server merely plays the role of receiving voice data from chat participants and relaying them without any manipulation, but the chat server of the present invention receives voice data from multiple users and synthesizes them into a single synthesized voice data. Since the final composite data transmitted to the chat participants becomes a single channel with a single bandwidth, all chat participants can receive the conversation contents voice data using only one channel bandwidth. Therefore, even in the case of a chat with almost unlimited chat participants, each chat participant can enjoy the chat with only a single channel bandwidth.

도 5에 본 발명이 구현되는 개념적 구성도를 도 3에 대응하여 나타내었다.FIG. 5 is a conceptual diagram illustrating the implementation of the present invention in correspondence with FIG. 3.

이상 본 발명의 바람직한 실시 양태에 대해 상세히 기술되었지만, 본 발명이 속하는 기술분야에 있어서 통상의 지식을 가진 사람이라면, 첨부된 청구 범위에 정의된 본 발명의 정신 및 범위를 벗어나지 않으면서 본 발명을 여러 가지로 변형 또는 변경하여 실시할 수 있음을 알 수 있을 것이다. 따라서 본 발명의 앞 실시 양태의 변경은 본 발명의 기술을 벗어날 수 없을 것이다.While preferred embodiments of the invention have been described in detail above, those of ordinary skill in the art will appreciate that the invention may be modified without departing from the spirit and scope of the invention as defined in the appended claims. It will be appreciated that modifications or variations may be made. Thus, modifications to the foregoing embodiments of the invention will not depart from the teachings of the invention.

본 발명을 이용하면 다음과 같은 효과가 있다.Use of the present invention has the following effects.

여러 사용자들로부터 음성 데이터를 받아 단일 합성 음성 데이터로 합성하며 각 채팅 참가자에게 전송되는 최종 합성 데이터는 단일 대역폭의 단일 채널이 되므로 채팅 참가자 단말기의 수신 데이터 처리 부하를 현저히 감소 시킬 수 있다. 또한 모든 채팅 참가자는 단일 대역폭만으로 음성 데이터를 처리할 수 있기 때문에 채팅 참가자수를 제한할 필요가 없다.Voice data from multiple users is synthesized into a single synthesized voice data, and the final synthesized data transmitted to each chat participant becomes a single channel of a single bandwidth, thereby significantly reducing the load of processing the received data of the chat participant terminal. In addition, all chat participants can process voice data with only a single bandwidth, so there is no need to limit the number of chat participants.

본 발명의 이러한 특성은 채팅뿐만이 아닌 대기업 또는 여러 집단간의 원격 회의에도 적용될 수 있다.This feature of the present invention can be applied not only to chat but also to teleconferences between large corporations or groups.

Claims

And a chat server configured to synthesize a plurality of users participating in a voice chat and voice data transmitted from the respective users, thereby generating a single bandwidth of synthesized voice data and transmitting the synthesized voice data to the respective users. In:

The chat server,

In order to asynchronously correct each of the user-specific voice data flowing asynchronously through the network, an excess factor value is given to the voice data having a larger amount of incoming data than the data amount normally flowing during the reception synchronization signal interval of the device. In addition, a correction factor is given to the voice data in such a manner that a delay factor is given to the voice data having a smaller amount of the incoming data than the data flowing normally during the reception synchronization signal interval of the device, and then queued in the voice reception queue. voice data correction / queuing module for queuing;

A decompression module for decompressing the queued voice data for each user;

A voice data synthesizing module for synthesizing the decompressed voice data of each user to generate synthesized voice data;

A final synthesized data extraction module for extracting final synthesized data by erasing self speech data from the synthesized speech data;

A synchronization module for calculating a corrected transmission synchronization signal interval in which the correction factor value is reflected;

A compression module for compressing the final composite data; And

And a transmitting module for transmitting the compressed final synthesized data to the respective users at the transmission synchronization signal intervals.

delete

The apparatus of claim 1, wherein the synthesized speech data is generated by adding PCM codes of the voice data of each user.

The apparatus of claim 1, wherein the final synthesized data is generated as a difference between a PCM code of the synthesized voice data and a PCM code of the voice data for each user.

6. The method of any one of claims 1 to 5, wherein the corrected transmission synchronization signal interval is an average of the transmission synchronization signal interval and the delay factor value of the device itself, the average of the excess factor values, and the deviation of the two factor values. ) Is calculated by the following equation by reflecting a correction factor value which is a value converted into time.

Corrected transmission synchronization signal interval = transmission synchronization signal interval + (average of delay factor value + average of overfactor value) * correction factor of the apparatus itself.