KR20230154520A

KR20230154520A - Processing apparatus for avoiding voice delay and saving transmission data

Info

Publication number: KR20230154520A
Application number: KR1020220054003A
Authority: KR
Inventors: 김제성; 박홍석
Original assignee: (주)포앤비
Priority date: 2022-05-02
Filing date: 2022-05-02
Publication date: 2023-11-09

Abstract

본 발명은 음성 지연 방지와 전송 데이터 절약을 위한 처리 장치에 관한 것으로, 송신측과 수신측 장치에 각각 묵음 구간 판별부를 구비함으로써, 출력측에서는 음질의 저하 없이 음성 출력 지연을 방지하고, 송신측에서는 데이터 전송량을 줄여 음성 데이터 전송으로 인해 발생하는 네트워크 트래픽을 줄일 수 있는 효과가 있다.The present invention relates to a processing device for preventing voice delay and saving transmission data. By providing a silence section discriminator in each of the transmitting and receiving devices, voice output delay is prevented without deteriorating sound quality on the output side, and the data transmission amount is provided on the transmitting side. This has the effect of reducing network traffic caused by voice data transmission.

Description

Processing device for preventing voice delay and saving transmission data {PROCESSING APPARATUS FOR AVOIDING VOICE DELAY AND SAVING TRANSMISSION DATA}

본 발명은 음성 송수신 시스템에 관한 것으로서, 더욱 상세하게는 음성 지연 방지와 전송 데이터 절약을 위한 처리 장치에 관한 것이다.The present invention relates to a voice transmission and reception system, and more specifically, to a processing device for preventing voice delay and saving transmission data.

음성 송수신 시스템은 음성 아날로그 신호를 디지털로 전송하기 위해 송신부와 수신부 사이에 도 1과 같은 펄스 부호 변조(Pulse Coded Modulation, PCM) 과정을 거친다(곽진규·이종두, 『정보통신기술』, 복두출판사, 2018, p.45 및 한국 등록특허 제10-2050911호의 배경기술 참조).The voice transmission/reception system goes through a pulse coded modulation (PCM) process as shown in Figure 1 between the transmitter and receiver in order to digitally transmit voice analog signals (Jinkyu Kwak and Jongdu Lee, 『Information and Communication Technology』, Bokdu Publishing, 2018 , p.45 and the background technology of Korean Patent No. 10-2050911).

사람의 음성 주파수 대역은 통상 300Hz에서 3400Hz로 약 3kHz의 대역폭을 가진다. 이 대역폭에 채널간의 간섭을 피하기 위한 보호대역을 고려하면 약 4kHz의 대역폭이 된다. 이러한 사람의 음성 신호를 재현하기 위한 표본화(Sampling)의 횟수는 대역폭의 2배인 8kHz이상 즉, 초당 8,000번 이상 표본화를 하면 원래의 신호를 복원할 수 있다. 각 표본은 하나의 음성 데이터가 되는데, 이를 8Bit(물론, 16Bit, 32Bit 등도 가능) 디지털 신호로 바꾸어 전송할 경우, 초당 64k비트로 전송할 수 있게 된다. 도 1에서는 편의상 각 표본 당 4비트를 할당한 예를 보여준다.The human voice frequency band typically ranges from 300Hz to 3400Hz, with a bandwidth of approximately 3kHz. Considering the guard band to avoid interference between channels in this bandwidth, the bandwidth is about 4kHz. The number of sampling to reproduce this human voice signal is 8 kHz or more, which is twice the bandwidth, that is, if sampling is performed more than 8,000 times per second, the original signal can be restored. Each sample becomes one piece of voice data, and if this is converted into an 8-bit (of course, 16-bit, 32-bit, etc.) digital signal and transmitted, it can be transmitted at 64k bits per second. Figure 1 shows an example in which 4 bits are allocated to each sample for convenience.

바꾼 디지털 신호, 즉 디지털화된 음성 데이터를 인터넷 등 네트워크를 통해 전송할 때 송수신 과정에서 음성 지연이 발생한다. 여기서, 음성 지연이란 송신자의 음성 데이터 송신시각과 수신자가 음성 데이터 수신 후 다른 장치에 전송할 때 시각 또는 스피커 등의 장치를 통해 출력할 때의 시각과의 차이가 대화에 지장을 주는 정도로 벌어지는 증상을 말한다.When transmitting a changed digital signal, that is, digitized voice data, over a network such as the Internet, voice delay occurs during the transmission and reception process. Here, voice delay refers to a symptom where the difference between the sender's voice data transmission time and the recipient's time when transmitting voice data to another device after receiving it or output through a device such as a speaker increases to the extent that it interferes with conversation. .

위의 음성 지연은 다음의 경우에 발생한다. 1)음성 데이터의 입력, 압축, 전송, 수신, 압축해제, 출력 과정에서 CPU(중앙처리장치)와 같은 프로세서가 이 과정을 처리하는데에 시간이 소요됨으로 인해 지연이 누적되거나, 2) 네트워크의 장애로 인해 일시적으로 데이터 전송이 안되다가 장애가 해소되면서 누적된 데이터가 일시에 수신자측으로 전송될 때 발생한다. The above voice delay occurs in the following cases. 1) In the process of inputting, compressing, transmitting, receiving, decompressing, and outputting voice data, delays accumulate due to the time it takes for processors such as CPU (central processing unit) to process these processes, or 2) network failures This occurs when data transmission is temporarily interrupted due to a problem and the accumulated data is transmitted to the recipient at once as the error is resolved.

이러한 음성 지연을 없애기 위해서는 누적되는 음성 데이터를 출력하지 않고 버려야 한다. 하지만 음성 데이터를 처리하지 않고 버릴 경우 연속적인 음성 출력이 되지 않고 음성이 끊겨 들려 상대가 무슨말을 했는지 모르게 되는 증상이 발생한다. In order to eliminate this voice delay, the accumulated voice data must be discarded without being output. However, if voice data is discarded without processing, continuous voice output is not possible and the voice is interrupted, causing a symptom in which the other person cannot know what was said.

상기 문제점을 해결하기 위해, 한국 공개특허 제10-2019-0025334호에서는 수신된 음성 데이터를 음성 구간과 묵음 구간으로 분류하여 묵음 구간으로 분류된 음성 데이터를 드랍하거나(출력하지 않거나)거나 재생 속도를 가속하여 출력하는 기술을 제안하였다. 그러나, 여기서는 송신부가 묵음 데이터를 포함한 음성 데이터 전부를 수신부에 전송하고, 수신부는 이를 저장부에 저장한 후 구간 분류부에 의해 음성 구간과 묵음 구간으로 분류하는 것이어서, 출력장치에서 드랍될 묵음 데이터도 전송함에 따라 네트워크 트래픽을 불필요하게 유발하는 문제점이 있다. 묵음 구간으로 분류하는 방법은 수신하여 저장된 음성 데이터를 미리 설정된 길이(예를 들어, 10ms)로 기계적으로 균등하게 분할하고, 분할된 음성 데이터를 음성 구간과 묵음 구간으로 분류하는 기술이 개시되어 있다. 또한, 분류된 묵음 구간이 미리 설정된 제 1, 2 기준시간을 초과하는지 여부에 따라 음성 데이터를 그대로, 재생 속도 가속 또는 드랍하는 것으로 개시되어 음성 데이터 처리 장치가 복잡해지는 문제점이 있다.In order to solve the above problem, Korean Patent Publication No. 10-2019-0025334 classifies the received voice data into voice sections and silent sections and drops (does not output) the voice data classified as silent sections or reduces the playback speed. A technology for accelerated output was proposed. However, here, the transmitter transmits all voice data including silence data to the receiver, the receiver stores it in the storage, and then classifies it into voice sections and silence sections by the section classification unit, so the silence data to be dropped from the output device is also There is a problem that unnecessary network traffic is caused during transmission. As a method of classifying into silent sections, a technology has been disclosed that mechanically divides received and stored voice data into preset lengths (for example, 10 ms) evenly and classifies the divided voice data into voice sections and silence sections. In addition, there is a problem in that the playback speed of the voice data is accelerated or dropped depending on whether the classified silence section exceeds the first or second preset reference time, which complicates the voice data processing device.

본 발명은 상기 종래 기술의 문제점을 해결하기 위해 제안된 것으로, 송신장치에서 입력된 음성 데이터에 묵음 구간이 있는지 여부를 판별하여 묵음 데이터는 폐기하고 남은 음성 데이터만 전송하여 출력하게 하는 음성 지연 방지와 전송 데이터 절약을 위한 처리 장치를 제공한다.The present invention was proposed to solve the problems of the prior art, and prevents voice delay by determining whether there is a silence section in voice data input from a transmitter, discarding the silence data, and transmitting and outputting only the remaining voice data. Provides a processing device for saving transmission data.

상기 목적을 달성하기 위하여, 본 발명에 의한 음성 데이터 입력 및 전송장치는 음성 데이터를 입력받는 입력부; 입력된 상기 음성 데이터에서 묵음 구간을 찾아내는 송신측 묵음 구간 판별부; 및 상기 음성 데이터 중 상기 송신측 묵음 구간 판별부로 찾아낸 묵음 구간에 해당하는 묵음 데이터는 빼고 남은 1차 선별된 음성 데이터만 네트워크로 전송하는 송신부를 포함하는 것을 특징으로 한다.In order to achieve the above object, a voice data input and transmission device according to the present invention includes an input unit that receives voice data; a silent section determination unit on the transmitting side that finds a silent section in the input voice data; and a transmitter that removes silence data corresponding to the silence section found by the silence section discriminator on the transmitting side among the voice data and transmits only the remaining primarily selected voice data to the network.

상기 송신측 묵음 구간 판별부는 입력된 아날로그 음성 데이터를 표본화하여 양자화한 다음, 상기 양자화된 표본(샘플)의 크기가 설정된 임계선의 크기보다 작아 이웃한 좌, 우측 경계선 안에 들고, 상기 좌측 경계선에서 일정 시간 지난 좌측 분리선부터 상기 우측 경계선에서 일정 시간 전의 우측 분리선까지를 묵음 구간으로 판별하도록 구비될 수 있다.The transmitter-side silence section discriminator samples and quantizes the input analog voice data, and then the size of the quantized sample (sample) is smaller than the size of the set threshold line, so that it falls within the neighboring left and right boundaries, and is located within the left boundary line for a certain period of time. It may be equipped to determine the silence section from the last left dividing line to the right dividing line a certain time ago from the right boundary line.

본 발명에 의한 음성 데이터 수신 및 출력장치는 상술한 음성 데이터 입력 및 전송장치와 상기 네트워크로 연결되되, 상기 네트워크로 전송된 상기 1차 선별된 음성 데이터를 수신하는 수신부; 수신된 상기 1차 선별된 음성 데이터에서 묵음 구간을 찾아내는 수신측 묵음 구간 판별부; 및 상기 1차 선별된 음성 데이터 중 상기 수신측 묵음 구간 판별부로 찾아낸 묵음 구간에 해당하는 묵음 데이터는 빼고 남은 2차 선별된 음성 데이터만 출력하는 출력부를 포함하는 것을 특징으로 한다.A voice data reception and output device according to the present invention is connected to the above-described voice data input and transmission device and the network, and includes a receiving unit that receives the first selected voice data transmitted through the network; a silent section determination unit on the receiving side that finds a silent section in the received first selected voice data; and an output unit that outputs only the remaining secondarily selected voice data after subtracting the silence data corresponding to the silence section found by the silence section determination unit on the receiving side among the firstly selected voice data.

상기 수신측 묵음 구간 판별부는 수신된 디지털 음성 데이터를 복호화한 다음, 상기 복호화된 표본(샘플)의 크기가 설정된 임계선의 크기보다 작아 이웃한 좌, 우측 경계선 안에 들고, 상기 좌측 경계선에서 일정 시간 지난 좌측 분리선부터 상기 우측 경계선에서 일정 시간 전의 우측 분리선까지를 묵음 구간으로 판별하도록 구비될 수 있다.The receiving-side silence section discriminator decodes the received digital voice data, and then the size of the decoded sample (sample) is smaller than the size of the set threshold line, so that it falls within the neighboring left and right boundaries, and the left side after a certain period of time from the left boundary line. It may be provided to determine the area from the dividing line to the right dividing line a certain time ago as a silent section.

본 발명에 의한 음성 송수신 시스템은 상술한 음성 데이터 입력 및 전송장치와 음성 데이터 수신 및 출력장치를 포함하되, 상기 임계선은 상기 양자화된 표본의 최대 크기 또는 상기 복호화된 표본의 최대 크기의 5% 이내에서 주변 소음의 크기에 따라 조절되는 것을 특징으로 한다.The voice transmission/reception system according to the present invention includes the voice data input and transmission device and the voice data reception and output device, and the threshold line is within 5% of the maximum size of the quantized sample or the maximum size of the decoded sample. It is characterized in that it is adjusted according to the level of surrounding noise.

본 발명은 송신측과 수신측 장치에 각각 묵음 구간 판별부를 구비함으로써, 출력측에서는 음질의 저하 없이 음성 출력 지연을 방지하고, 송신측에서는 데이터 전송량을 줄여 음성 데이터 전송으로 인해 발생하는 네트워크 트래픽을 줄일 수 있는 효과가 있다.The present invention provides a silence section discriminator on each of the transmitting and receiving devices, thereby preventing voice output delay without deteriorating sound quality on the output side and reducing the amount of data transmission on the transmitting side, thereby reducing network traffic caused by voice data transmission. It works.

도 1은 음성 송수신 시스템으로 음성 아날로그 신호를 디지털로 전송하기 위해 송신부와 수신부 사이에 펄스 부호 변조(PCM) 과정을 거치는 것을 보여주는 개념도이다.
도 2은 본 발명의 일 실시예에 의한 음성 송수신 시스템에서 음성 입력부터 송수신, 출력까지의 프로세스를 나타낸 개념도이다.
도 3과 도 4는 각각 본 발명의 일 실시예에 따라 음성 아날로그 데이터가 16bit와 8bit로 샘플링되어 시간에 따른 양자화 또는 복호화된 표본(샘플)의 크기 분포를 보인 스펙트럼이다.
도 5(a)는 종래 처리 방법으로 송신 또는 수신하는 음성 데이터를 보인 것이고, 도 5(b)는 본 발명의 일 실시예에 따라 폐기되는(전송 또는 출력하지 않는) 묵음 데이터(303)와 전송 또는 출력되는 남은 음성 데이터(304)를 함께 보인 것이다.
도 6은 종래 처리 방법으로 묵음 데이터도 전송되어 스피커로 출력되나 아무런 소리가 나지 않으면서, 불필요한 데이터 전송으로 네트워크 트래픽만 발생시키는 문제점을 보인 동작 개념도이다.
도 7은 본 발명의 본 발명의 일 실시예에 따라 묵음 데이터는 아예 전송되지 않아 수신부에서 누적되지 않게 되고, 이로인해 음성 출력 지연이 해소될 뿐만 아니라 불필요한 데이터 전송으로 네트워크 트래픽을 발생시키는 문제도 해소할 수 있음을 보인 동작 개념도이다.Figure 1 is a conceptual diagram showing a pulse code modulation (PCM) process between a transmitter and a receiver to digitally transmit a voice analog signal to a voice transmission and reception system.
Figure 2 is a conceptual diagram showing the process from voice input to transmission/reception and output in a voice transmission/reception system according to an embodiment of the present invention.
Figures 3 and 4 are spectra showing the size distribution of samples (samples) in which voice analog data is sampled into 16 bits and 8 bits and quantized or decoded over time, respectively, according to an embodiment of the present invention.
Figure 5(a) shows voice data transmitted or received using a conventional processing method, and Figure 5(b) shows silence data 303 that is discarded (not transmitted or output) and transmitted according to an embodiment of the present invention. Alternatively, the remaining output audio data 304 is shown together.
Figure 6 is a conceptual diagram illustrating the problem that silent data is transmitted and output to a speaker using a conventional processing method, but no sound is produced and only network traffic is generated due to unnecessary data transmission.
Figure 7 shows that according to an embodiment of the present invention, silence data is not transmitted at all and is not accumulated in the receiver. This not only eliminates voice output delay, but also solves the problem of generating network traffic due to unnecessary data transmission. This is a concept diagram showing that it can be done.

이하, 첨부된 도면을 참조하며 본 발명의 바람직한 실시예에 대하여 설명한다.Hereinafter, preferred embodiments of the present invention will be described with reference to the attached drawings.

본 발명의 일 실시예에 의한 음성 송수신 시스템은, 도 2와 같이, 크게 송신장치(10), 인터넷 등의 네트워크(20) 및 수신장치(30)로 구성된다.As shown in FIG. 2, the voice transmission and reception system according to an embodiment of the present invention is largely composed of a transmission device 10, a network 20 such as the Internet, and a reception device 30.

여기서, 상기 송신장치(10)는 본 발명의 일 실시예로, 음성 데이터를 입력받는 입력부; 입력된 상기 음성 데이터에서 묵음 구간을 찾아내는 송신측 묵음 구간 판별부; 및 상기 음성 데이터 중 상기 송신측 묵음 구간 판별부로 찾아낸 묵음 구간에 해당하는 묵음 데이터는 빼고 남은 1차 선별된 음성 데이터만 네트워크로 전송하는 송신부를 포함하는 음성 데이터 입력 및 전송장치로 구성함이 바람직하다.Here, the transmitting device 10 is an embodiment of the present invention, and includes an input unit that receives voice data; a silent section determination unit on the transmitting side that finds a silent section in the input voice data; and a transmitter that removes the silence data corresponding to the silence section found by the silence section discriminator of the transmitter among the voice data and transmits only the remaining primarily selected voice data to the network. It is preferable to configure a voice data input and transmission device. .

입력부는 마이크 등으로 아날로그 음성 신호(데이터)를 입력받아 송신측 묵음 구간 판별부로 보내도록 구비된다.The input unit is equipped to receive analog voice signals (data) through a microphone, etc. and send them to the silence section discriminator on the transmitting side.

송신측 묵음 구간 판별부는 입력된 아날로그 음성 데이터를, 도 1의 좌측과 같이, 표본화하여 양자화한 다음, 도 3 및 도 4로 참조되는 바와 같이, 상기 양자화된 표본(샘플)의 크기가 설정된 임계선(201)의 크기(B)보다 작아 이웃한 좌, 우측 경계선(202a, 202b) 안에 들고, 상기 좌측 경계선(202a)에서 일정 시간(C) 지난 좌측 분리선(203a)부터 상기 우측 경계선(202b)에서 일정 시간 전의 우측 분리선(203b)까지를 묵음 구간(204)으로 판별하도록 구비될 수 있다.The transmitting-side silence section discriminator samples and quantizes the input analog voice data, as shown on the left of FIG. 1, and then, as shown in FIGS. 3 and 4, sets the size of the quantized sample (sample) to a threshold line. It is smaller than the size (B) of (201) and lies within the neighboring left and right boundaries (202a, 202b), and is located from the left dividing line (203a) to the right boundary line (202b) after a certain period of time (C) from the left boundary line (202a). It may be equipped to determine the area up to the right dividing line 203b before a certain period of time as the silent section 204.

도 3과 도 4는 각각 본 발명의 일 실시예에 따라 음성 아날로그 데이터가 16bit와 8bit로 샘플링되어 시간에 따른 양자화 또는 복호화된 표본(샘플)의 크기 분포를 보인 스펙트럼이다.Figures 3 and 4 are spectra showing the size distribution of samples (samples) in which voice analog data is sampled into 16 bits and 8 bits and quantized or decoded over time, respectively, according to an embodiment of the present invention.

여기서, 상기 임계선(201)은 양자화된 표본의 최대 크기(A)의 5% 이내에서 주변 소음의 크기에 따라 조절되도록 함이 바람직하다. 이렇게 함으로써, 음성 데이터로 입력되는 주변 상황, 즉 조용한 장소, 거리, 달리는 지하철과 같이 주변 소음의 크기가 다른 상황에서 임계선(201)의 크기(B)가 다르게 설정되어, 묵음 구간(204)을 달리할 수 있다. 예를 들어, 보통 음성이 65dB이고, 도서관과 같이 조용한 곳은 40dB, 도로변 소음은 70dB, 철도변 소음은 80dB 정도가 되므로, 도서관과 같은 실내에서는 임계선(201)의 크기(B)를 표본 최대 크기(A)의 5%, 철도변에서는 표본 최대 크기(A)의 1%로 하고, 그 사이 환경에서는 표본 최대 크기(A)의 {1+[4x(80-소음)/40]}%로 각각 설정되게 할 수 있다. 이렇게 하면, 묵음으로 버려지는 구간(204)을 주변 상황에 맞추어 1차적으로 조절할 수 있게 된다.Here, the threshold line 201 is preferably adjusted according to the size of the surrounding noise within 5% of the maximum size (A) of the quantized sample. By doing this, the size (B) of the threshold line 201 is set differently in surrounding situations where voice data is input, that is, in situations where the size of surrounding noise is different, such as a quiet place, street, or running subway, thereby creating a silent section 204. It can be done differently. For example, normal voice is 65 dB, quiet places such as libraries are 40 dB, roadside noises are 70 dB, and railroad noises are 80 dB. Therefore, in rooms such as libraries, the size (B) of the threshold line 201 is set to the sample maximum. 5% of the size (A), 1% of the maximum sample size (A) on the railway side, and {1+[4x(80-noise)/40]}% of the maximum sample size (A) in the environment in between. Each can be set separately. In this way, the section 204 that is left silent can be primarily adjusted according to the surrounding situation.

또한, 상기 묵음 구간(204)을 상기 좌측 경계선(202a)에서 일정 시간(C) 지난 좌측 분리선(203a)부터 상기 우측 경계선(202b)에서 일정 시간 전의 우측 분리선(203b)까지로 되도록 함으로써, 묵음으로 버려지는 구간(204)을 2차적으로 조절하게 된다. 여기서, 좌측 경계선(202a)과 좌측 분리선(203a) 사이의 시간(C)은 우측 경계선(202b)과 우측 분리선(203b) 사이의 시간과 달리할 수도 있으나, 양측 시간 모두 200ms 이내의 범위에서 동일하게 함이 바람직하다. 물론, 상기 시간(C)은 설정한 임계선(201)의 크기(B)가 커지면 이에 따라 커지게 할 수 있다.In addition, the silence section 204 is made to be silent by extending from the left dividing line 203a a certain time (C) after the left boundary line 202a to the right dividing line 203b a certain time before the right boundary line 202b. The discarded section 204 is secondarily controlled. Here, the time (C) between the left boundary line (202a) and the left dividing line (203a) may be different from the time between the right boundary line (202b) and the right dividing line (203b), but both times are the same within 200 ms. It is desirable to do so. Of course, the time (C) can be increased accordingly as the size (B) of the set threshold line 201 increases.

상기 송신부는 송신측 묵음 구간 판별부로 찾아낸 묵음 구간에 해당하는 묵음 데이터(양자화된 표본들)는 빼고 남은 1차 선별된 음성 데이터만 부호화하여, 네트워크(20)로 전송하도록 구비될 수 있다.The transmitter may be equipped to subtract the silence data (quantized samples) corresponding to the silence section found by the silence section discriminator on the transmitter side, encode only the remaining primarily selected voice data, and transmit it to the network 20.

실시예에 따라, 송신측 묵음 구간 판별부 및/또는 송신부는 상술한 기능을 수행하기 위해 버퍼 등 저장수단을 더 구비할 수 있다.Depending on the embodiment, the silent section determination unit and/or the transmitter on the transmitting side may further be provided with storage means such as a buffer to perform the above-described functions.

한편, 상기 수신장치(30)는 본 발명의 다른 실시예로, 상술한 음성 데이터 입력 및 전송장치(10)와 네트워크(20)를 통해 연결되되, 상기 네트워크로 전송된 상기 1차 선별된 음성 데이터를 수신하는 수신부; 수신된 상기 1차 선별된 음성 데이터에서 묵음 구간을 찾아내는 수신측 묵음 구간 판별부; 및 상기 1차 선별된 음성 데이터 중 상기 수신측 묵음 구간 판별부로 찾아낸 묵음 구간에 해당하는 묵음 데이터는 빼고 남은 2차 선별된 음성 데이터만 출력하는 출력부를 포함하는 음성 데이터 수신 및 출력장치로 구성함이 바람직하다.Meanwhile, in another embodiment of the present invention, the receiving device 30 is connected to the above-described voice data input and transmission device 10 through a network 20, and receives the first selected voice data transmitted to the network. a receiving unit that receives; a silent section determination unit on the receiving side that finds a silent section in the received first selected voice data; and an output unit that outputs only the remaining secondary selected voice data after excluding the silence data corresponding to the silence section found by the silence section discriminator on the receiving side among the first selected voice data. desirable.

수신부는 네트워크로 전송된 1차 선별된 음성 데이터(부화화된 디지털 음성 데이터)를 수신하여 수신측 묵음 구간 판별부로 보내도록 구비된다.The receiving unit is equipped to receive the first selected voice data (encoded digital voice data) transmitted over the network and send it to the silence section discriminator on the receiving side.

수신측 묵음 구간 판별부는 수신된 디지털 음성 데이터를, 도 1의 우측과 같이, 복호화한 다음, 마찬가지로 도 3 및 도 4로 참조되는 바와 같이, 상기 복호화된 표본(샘플)의 크기가 설정된 임계선(201)의 크기(B)보다 작아 이웃한 좌, 우측 경계선(202a, 202b) 안에 들고, 상기 좌측 경계선(202a)에서 일정 시간(C) 지난 좌측 분리선(203a)부터 상기 우측 경계선(202b)에서 일정 시간 전의 우측 분리선(203b)까지를 묵음 구간(204)으로 판별하도록 구비될 수 있다.The receiving side silence section discriminator decodes the received digital voice data, as shown on the right side of FIG. 1, and then, as also referred to in FIGS. 3 and 4, sets the size of the decoded sample (sample) to a threshold line ( It is smaller than the size (B) of 201) and lies within the neighboring left and right boundaries (202a, 202b), and is located at a certain distance from the left dividing line (203a) after a certain period of time (C) from the left boundary line (202a) to the right boundary line (202b). It may be provided to determine the area up to the right dividing line 203b before the time as the silent section 204.

상기 수신측 묵음 구간 판별부에서도 상술한 송신측 묵음 구간 판별부와 같은 방식으로 묵음 구간(204)을 판별하여 수신된 디지털 음성 데이터에서 2차적으로 묵음 구간을 찾을 수 있게 된다. 상기 수신측 묵음 구간 판별부에서의 임계선(201)이나 묵음 구간(204), 좌측 경계선(202a)과 좌측 분리선(203a) 사이의 시간(C) 등의 설명은, 송신측 묵음 구간 판별부에서 상술한 바와 같으므로, 생략한다.The silence section determination unit on the receiving side determines the silence section 204 in the same manner as the above-described silence section determination unit on the transmitting side, and can secondarily find the silence section in the received digital voice data. Descriptions of the critical line 201, the silence section 204, and the time (C) between the left boundary line 202a and the left dividing line 203a in the silence section determination unit on the transmitting side are explained in the silence section determination unit on the transmitting side. Since it is the same as described above, it is omitted.

상기 출력부는 수신된 1차 선별된 음성 데이터를 복호화한 표본들 중에서 수신측 묵음 구간 판별부로 찾아낸 묵음 구간에 해당하는 묵음 데이터(복호화된 표본들)는 빼고 남은 2차 선별된 음성 데이터로만 아날로그 출력 데이터로 하여(적분하여), 스피커 등 출력 장치로 출력하도록 구비될 수 있다.The output unit excludes the silence data (decoded samples) corresponding to the silence section found by the silence section discriminator on the receiving side among the samples decoded from the received first selected voice data, and outputs analog output data only from the remaining second selected voice data. It can be provided to output to an output device such as a speaker (by integration).

실시예에 따라, 수신부, 수신측 묵음 구간 판별부 및/또는 출력 장치는 상술한 기능을 수행하기 위해 버퍼 등 저장수단을 더 구비할 수 있다.Depending on the embodiment, the receiving unit, the receiving-side silence section determination unit, and/or the output device may further be provided with a storage means such as a buffer to perform the above-described functions.

도 5(a)는 종래 처리 방법으로 송신 또는 수신하는 음성 데이터를 보인 것이고, 도 5(b)는 본 발명의 일 실시예에 따라 폐기되는(전송 또는 출력하지 않는) 묵음 데이터(303)와 전송 또는 출력되는 남은 음성 데이터(304)를 함께 보인 것이다.Figure 5(a) shows voice data transmitted or received using a conventional processing method, and Figure 5(b) shows silence data 303 that is discarded (not transmitted or output) and transmitted according to an embodiment of the present invention. Alternatively, the remaining output audio data 304 is shown together.

도 6은 종래 처리 방법에 따라 묵음 데이터도 전송되어 스피커로 출력되는 음성 송수신 시스템의 동작 개념도이다. 사각 박스는 1초 분량의 음성 데이터를 보인 것이다. 입력부에 사람이 소리를 안 냈지만 묵음 데이터(7, 8)도 전송되어 수신부의 버퍼 등 저장수단에 누적되었다가 출력부(스피커)로 출력된다. 그러나, 묵음 데이터(7, 8)가 음성 데이터 6 다음으로 출력되나 소리의 크기가 0에 근접했기에 아무런 소리를 들을 수 없게 된다. 결국, 종래 처리 방법에 따르게 되면, 묵음 데이터(7, 8)의 전송으로 수신부에 누적 데이터로 출력 지연 초래는 물론, 불필요한 데이터 전송으로 네트워크 트래픽을 유발하게 됨을 알 수 있다.Figure 6 is a conceptual diagram of the operation of a voice transmission and reception system in which silence data is also transmitted and output through a speaker according to a conventional processing method. The square box shows 1 second of voice data. Although no one made a sound in the input unit, silent data (7, 8) is also transmitted and accumulated in a storage means such as a buffer in the receiver, and then output to the output unit (speaker). However, silence data (7, 8) is output next to voice data 6, but because the sound volume is close to 0, no sound can be heard. In the end, it can be seen that if the conventional processing method is followed, the transmission of silent data 7 and 8 not only causes output delay as accumulated data in the receiver, but also causes network traffic due to unnecessary data transmission.

한편, 도 7은 본 발명의 본 발명의 일 실시예에 따라 묵음 데이터는 아예 전송되지 않는 음성 송수신 시스템의 동작 개념도이다. 이에 의하면, 묵음 데이터는 아예 전송이 되지 않아 수신부에서 누적되지 않는다. 이로 인해 수신부의 데이터 누적이 해소되어 음성 출력 지연이 해소될 뿐만 아니라, 그동안 지연의 원인이 되었던 “인터넷 장애＂와 “처리 과정”지체의 문제를 해결하는 데도 도움을 줄 수 있다.Meanwhile, Figure 7 is a conceptual diagram of the operation of a voice transmission and reception system in which silence data is not transmitted at all according to an embodiment of the present invention. According to this, silence data is not transmitted at all and is not accumulated in the receiver. This not only eliminates data accumulation in the receiver and resolves voice output delays, but also helps solve the problems of “Internet failure” and “processing” delays that have been causing delays.

이상으로, 첨부된 도면을 참조하며 본 발명의 바람직한 실시예를 중심으로 설명하였으나, 미 설명되었거나 부족한 부분은 본 발명자가 함께 참여하여 개발한 한국 등록특허 제10-2050911호와 배경 기술에서 함께 언급한 공개특허 제10-2019-0025334호를 참고할 수 있다. In the above, the present invention has been described with reference to the attached drawings, focusing on preferred embodiments, but parts that are not explained or are lacking are mentioned in Korean Patent No. 10-2050911, developed with the participation of the present inventor, and in the background technology. Please refer to Publication Patent No. 10-2019-0025334.

10: 송신장치(음성 데이터 입력 및 전송장치)
20: 네트워크(인터넷 등)
30: 수신장치(음성 데이터 수신 및 출력장치)10: Transmitting device (voice data input and transmission device)
20: Network (Internet, etc.)
30: Receiving device (voice data receiving and output device)

Claims

An input unit that receives voice data;
a silent section determination unit on the transmitting side that finds a silent section in the input voice data; and
A voice data input and transmission device comprising a transmitter that removes the silence data corresponding to the silence section found by the silence section discriminator of the transmitter among the voice data and transmits only the remaining primarily selected voice data to the network.

Connected to the voice data input and transmission device of claim 1 through the network,
a receiving unit that receives the first selected voice data transmitted through the network;
a silent section determination unit on the receiving side that finds a silent section in the received first selected voice data; and
An output unit for receiving and outputting voice data, comprising an output unit that outputs only the remaining secondary selected voice data after subtracting the silence data corresponding to the silence section found by the silence section discriminating unit on the receiving side among the first selected voice data.

According to claim 1,
The transmitter-side silence section discriminator samples and quantizes the input analog voice data, and then the size of the quantized sample (sample) is smaller than the size of the set threshold line, so that it falls within the neighboring left and right boundaries, and is located within the left boundary line for a certain period of time. A voice data input and transmission device, characterized in that it is equipped to determine a silence section from the left dividing line from the right dividing line to the right dividing line a certain time ago.

According to claim 2,
The receiving-side silence section discriminator decodes the received digital voice data, and then the size of the decoded sample (sample) is smaller than the size of the set threshold line, so that it falls within the neighboring left and right boundaries, and the left side after a certain period of time from the left boundary line. A voice data receiving and output device, characterized in that it is equipped to determine a silent section from the right dividing line to the right dividing line a certain time ago.

In a voice transmission and reception system including the voice data input and transmission device of claim 3 and the voice data reception and output device of paragraph 4,
The threshold line is adjusted according to the level of surrounding noise within 5% of the maximum size of the quantized sample or the maximum size of the decoded sample.